This site has been destroyed by Google forced upgrade to new way of WEB site.
All files links are not working. Many images has been lost in conversation.
Have to edit 190 pages manually. Will try to do ASAP but for this I need time ...

Thursday, June 10, 2010

Oracle Configuration Manager (OCM)-incomplete database recover problem

The Oracle configuration manager (OCM) centralizes configuration information based on your Oracle technology stack. Oracle uses secure access to your configuration information to help you achieve problem avoidance, faster problem resolution, better system stability, and easier management of your Oracle systems. It's benefits are:
  • Faster problem resolution from integrating your configuration information into the service request flow providing Oracle Support the information they need real-time to resolve your problem quickly and efficiently.
  • Improved systems stability delivered through proactive advice & health checks driven by Oracle best practices and personalized to your system configuration.
  • Simplified configuration management from a single, comprehensive and personalized dashboard of configurations, projects and inventory.
To make that feature alive you have to have install portion of software (download from Oracle Metalink, current version is 10.3.3) on your ORACLE_HOME where database resides. OCM doesn't effect in any way work of database or host where installed.

The problem

After successful installation and period of correct use, I was forced to make incomplete (PIT) recover in database where OCM was active. Very soon after, on my Oracle Meatalink dashboard, I found out problems with OCM. The problem was that data collection was sent 2 weeks ago.

The analyze

Analyze part for that enclose three commands, which are here extracted because better readability:
$ORACLE_HOME/ccr/bin/emCCR status
$ORACLE_HOME/ccr/bin/emCCR collect
$ORACLE_HOME/ccr/bin/emCCR disable_target
Main point of running "$ORACLE_HOME/ccr/bin/emCCR disable_target" is not to disable any target but to see which targets are now active, so this is why you have to press ENTER to exit last command without changes.
Here is the output:
ORACLE_HOME/ccr/bin/emCCR status
[oracle PROD4@server4 ~]$ $ORACLE_HOME/ccr/bin/emCCR status
Oracle Configuration Manager - Release: - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.
Start Date               27-Apr-2010 14:15:03
Last Collection Time     17-May-2010 14:10:00
Next Collection Time     18-May-2010 14:10:00
Collection Frequency     Daily at 14:10
Collection Status        idle
Log Directory            /u01/PROD/proddb/10.2.0/ccr/hosts/server4/log
Registered At            26-Jan-2010 14:10:39
Automatic Update         On
Collector Mode           Connected
[oracle PROD4@server4 ~]$

$ORACLE_HOME/ccr/bin/emCCR collect
[oracle PROD4@server4 ~]$ $ORACLE_HOME/ccr/bin/emCCR collect
Oracle Configuration Manager - Release: - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.
Collection and upload done.
[oracle PROD4@server4 ~]$

$ORACLE_HOME/ccr/bin/emCCR disable_target
[oracle PROD4@server4 ~]$ $ORACLE_HOME/ccr/bin/emCCR disable_target
Oracle Configuration Manager - Release: - Production
Copyright (c) 2005, 2009, Oracle and/or its affiliates.  All rights reserved.
No.        Category                     Target Name
0          Cluster                      CRSPROD
1          Host                         server4
2          Database Instance            PROD_CRSPROD_PROD4
3          Database Instance            PROD_CRSPROD_PROD3
4          Database Instance            PROD_CRSPROD_PROD2
5          Database Instance            PROD_CRSPROD_PROD1
6          Oracle Home                  OraPROD10g_home
7          Listener                     LISTENER_PROD4_ADMIN_server4
8          Listener                     LISTENER_PROD_IISPRODRAC4_server4
9          Oracle Configuration Manager Oracle Configuration Manager
10         Cluster Database             PROD_CRSPROD

Press Enter to exit the command.
Use Comma to separate multiple target numbers.
Enter the number(s) corresponding to the target(s) you wish to disable:

[oracle PROD4@server4 ~]$
Everything looks OK.

Then mine focus to look in "state directory and see dates of files:
ls -alt $ORACLE_HOME/ccr/state
If collectors are active file creation dates on any of files should be up to date .... and they were!

So I checked another thing, $ORACLE_HOME/ccr/hosts/* directories from all 4 nodes involved in RAC. Under this dirs you have to find hostname directory which holds all data in that $ORACLE_HOME. The most interesting part is log directory, where I found in sched.log, next content:
2010-04-22 23:30:04: Oracle Configuration Manager successfully started.
2010-05-26 14:05:11, [ERROR]: Failure while performing scheduled collection directive. Collector returned with following error: Error encountered attempting to upload to Oracle Configuration Manager content receiver
Error sending data []
As I said that my PIT recover was on 22th of May, I connect these two events in correlation. So, cause is confirmed.... now let me solve the damage.

The unsuccessful solution

In situations when emCCR is not working properly (I have such an situations before) mine approach was to re-instrument OCR. This is done through 2 straight forward steps.
  1. On all database instances for collections:

    cd $ORACLE_HOME/ccr/admin/scripts 
    ./ collectconfig -s $ORACLE_SID
    Successfully installed collectconfig in the database with SID=PROD1.
    Successfully installed collectconfig in the database with SID=PROD2.
    Successfully installed collectconfig in the database with SID=PROD3.
    Successfully installed collectconfig in the database with SID=PROD4.
  2. And then only on one node, force a collection to be submitted to Metalink

    $ORACLE_HOME/ccr/bin/emCCR collect 
After 2 days of waiting, I realize that this approach didn't give me expected result, mine dashboard was still showing errors according old collection date.

The solution

Reading through OCR manuals I found out that drop/recreate approach would cause no problems in any data or performance, so this was mine solution. here are steps:
  1. as SYSDBA, from one node, remove the Oracle Configuration Manager user and the associated objects from the database
    cd $ORACLE_HOME/ccr/bin
    ./emCCR stop
    SQL> @/$ORACLE_HOME/ccr/admin/scripts/dropocm.sql;
  2. stop the Scheduler and remove the crontab entry
    $ORACLE_HOME/ccr/bin/deployPackages -d $ORACLE_HOME/ccr/inventory/core.jar
  3. Delete the ccr directories on all nodes
    rm -rf $ORACLE_HOME/ccr
  4. This step is optional, but I strongly advise it in that the latest OCM from My Oracle Support dashboard.
  5. As oracle user, extract the zip file into $ORACLE_HOME. This will create new $ORACLE_HOME/ccr directory
  6. Install OCM on all nodes as

    $ORACLE_HOME/ccr/bin/setupCCR -s CSI MOSaccount

    $ORACLE_HOME/ccr/bin/setupCCR -s 12345678
  7. Instrument the database, only on node1

    $ORACLE_HOME/ccr/admin/scripts/ collectconfig -s $ORACLE_SID
  8. Force a collection and upload, from each node

    $ORACLE_HOME/ccr/bin/emCCR collect
And after some time (2 days or less) you'll see that OCM is collecting data as before.

The End

Why is incoplete recover a problem, was it only on this database version ( PSU 3) ... OCM version ( I was on 10.3.2 before) ... I do not know.

But as recreating OCM is not harmful operation in any way, this is OK step if you are in trouble that nothing else helps or fixing some bugs inside OCM.


No comments :

Post a Comment

Zagreb u srcu!

Copyright © 2009-2018 Damir Vadas

All rights reserved.

Sign by Danasoft - Get Your Sign