Monday, February 22, 2010

How to find correct SCN?

Recent online help to certain Mr. Muthu, who had problems with database recover, force me to write this blog topic. Even thought there are many problematic situations in this area one is very common and most often. So I'll try to explain what is the right path to solve such a situation (if it can be solved!).

The scenario

To describe starting situation, let us suppose this restore/recover scenario (we are talking about incomplete recover).
  1. startup nomount
  2. restore control file
  3. alter database mount
  4. restore database
Next step should be database recover (with or without until clause). But this action finishes with:
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of alter db command at 01/11/2010 12:08:19
ORA-01152: file 1 was not restored from a sufficiently old backup 
ORA-01110: data file 1: '+DATA/hac/datafile/system.366.708004385'
So we are stacked because recover could not be consistent. The main problem is that restored data files need ALL to be recovered at least to largest SCN number+1.

This situation is explained world wide but main main question is: what is the lowest SCN that can recover database enough to be open?

Solution-RMAN list backup

First solution is to look in RMAN repository or use control file records to retrieve backup of datafile 1. In a case of RMAN catalog you'll not have too much trouble because you'll find enough information to see backup of wanted datafile and it's SCN. Output may looks like:
RMAN> list backup of datafile 1;

List of Backup Sets
===================

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
645     Full    127.30M    DISK        00:01:02     06-DEC-09
        BP Key: 645   Status: AVAILABLE  Compressed: YES  Tag: TAG20091206T183018
        Piece Name: +BCKP/hac/backupset/2009_12_06/tag20091206t183018_0.7966.704917821
  List of Datafiles in backup set 645
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  1       Full 35728924   06-DEC-09 +DATA/hac/datafile/system.282.694017165
...
If you are lucky and get result look in column "Ckp SCN" and you'll get missing SCN. Unfortunately this may not help you always.

In nocatalog mode it is more then likely that your command may retrieve no data:
RMAN> list backup of datafile 1;


RMAN>
No backup! The reason is no backup was taken in time provided with control_file_record_keep_time parameter.

Solution-RMAN list copy

Next solution is usage of RMAN's "list copy", which shows only backup of "datafile 1", with the same retention as described in previously explained in "List backup" solution. However if you are lucky and get result it is more or less like:
RMAN> list copy of datafile 1;

List of Datafile Copies
Key     File S Completion Time Ckp SCN    Ckp Time        Name
------- ---- - --------------- ---------- --------------- ---------------------------------------------
1917    1    A 08-JAN-10       49152352   08-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_nvl2urjr
1867    1    X 07-JAN-10       49125664   07-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_m4l2sii8
1841    1    X 07-JAN-10       49117957   07-JAN-10       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_l7l2sb2u
1813    1    X 18-SEP-09       9643175    18-SEP-09       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_edkpitiv
1666    1    A 17-SEP-09       9174432    17-SEP-09       /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_ahkpfgq1

Again, regardless shown several SCNs this might not be enough to help you.

Solution-use x$ (C based) tables

x$ tables are Oracle's core tables based on pure C data and should be handle with additional care. The beauty of them is they are available when database in mounted and when all "dba_*" views are not available as well as some v$ based ones.

To see status of all restored datafiles can be done with this SQL
set linesize 200;
set pagesize 100;
col inst_id for 9999999 heading 'Instance #'
col file_nr for 9999999 heading 'File #'
col file_name for A50 heading 'File name'
col checkpoint_change_nr for 99999999999999 heading 'Checkpoint #'
col checkpoint_change_time for A20 heading 'Checkpoint time'
col last_change_nr for 99999999999999 heading 'Last change #'
SELECT
      fe.inst_id,
      fe.fenum file_nr,
      fn.fnnam file_name,
      TO_NUMBER (fe.fecps) checkpoint_change_nr,
      fe.fecpt checkpoint_change_time,
      fe.fests last_change_nr,
      DECODE (
              fe.fetsn,
              0, DECODE (BITAND (fe.festa, 2), 0, 'SYSOFF', 'SYSTEM'),
              DECODE (BITAND (fe.festa, 18),
                      0, 'OFFLINE',
                      2, 'ONLINE',
                      'RECOVER')
      ) status
FROM x$kccfe fe,
     x$kccfn fn
WHERE    (   (fe.fepax != 65535 AND fe.fepax != 0 )
          OR (fe.fepax = 65535 OR fe.fepax = 0)
         )
     AND fn.fnfno = fe.fenum
     AND fe.fefnh = fn.fnnum
     AND fe.fedup != 0
     AND fn.fntyp = 4
     AND fn.fnnam IS NOT NULL
     AND BITAND (fn.fnflg, 4) != 4
ORDER BY fe.fenum
; 
The output might looks like:
Instance #   File # File name                     Checkpoint# Checkpoint time      Last change #  STATUS
---------- -------- ----------------------------- ----------- -------------------- -------------- ------
         2        1 /u01/system.312.711642293         891734 02/25/2010 10:24:21   891734         SYSTEM
         2        2 /u01/undotbs1.311.711642299       891734 02/25/2010 10:24:21   891734         ONLINE
         2        3 /u01/sysaux.310.711642299         891734 02/25/2010 10:24:21   891734         ONLINE
         2        4 /u01/undotbs2.307.711642303       891734 02/25/2010 10:24:21   891734         ONLINE
         2        5 /u01/undotbs3.306.711642305       891734 02/25/2010 10:24:21   891734         ONLINE
         2        6 /u01/users.315.711642305          891734 02/25/2010 10:24:21   891734         ONLINE

6 rows selected.
Suppose that real 891734 represents imaginary "xxxxxxxx" number. As you see it represents SCN numbers of each datafile's time when it was last changed. These "xxxxxxxx" numbers are all probably different (opposite in this case). This is tipical representation of inconsistent backup.

So your job is to find record where "File #" column value is equal "1". This is usually first record named like "system.yyy" and check it's last change SCN. And this is the number you are looking for.

Remember that SCN for recover is always xxxxxxxx+1, for one bigger then one found in list!

To cover the whole story, for successful recover of any backup set like that, you need to apply archive logs which's SCN's are between min(xxxxxxxx)-max(xxxxxxxx)+1 but you do not need to specify min(xxxxxxxx) because RMAN will know that automatically.

So your restore should use until SCN clause in RMAN script. In pseudo language that looks like:
startup nomount;
restore controlfile;
alter database mount;

set until scn xxxxxxxx+1;

restore database;
recover database;

alter database open resetlogs;

If all SCNs from previous query are the same then (as they are in our case) then you are dealing with consistent (aka cold backup). Cold backup means that all datafiles are having same SCN so practicly they do not need recover but only as declarative term (you do not need any archived log file beside). In a cases like that, your restore RMAN script might looks like:
restore database;
recover database noredo;
alter database open resetlogs;

However, in some consistent backup cases, I remember that I found on some database version a bug as explained at the begginig (ORA-01152). Based on previously shown real numbers, your "full proof" restore script for consistent backup, should look like:
set until scn=891735;
restore database;
recover database noredo;
alter database open resetlogs;
So there is no affraid of use until clause even in restoring consistent backups. This is especially the case in "backup as copy" when you deal with backup outside of ASM storage.

Solution-RMAN restore database preview

To prevent searching for information in restore/recover time, there is elegant method that will tell you which SCN is needed for successful restore.

After regular database backup, use MAN command which will collect and show all important data for this backupset.
backup database;
restore database preview;
At the end of command shows exact SCN that you need for successful recover. At the end of log file output is like:
Media recovery start SCN is xxxxxxxxx 
Recovery must be done beyond SCN xxxxxxxxx to clear data files fuzziness
Finished restore at 17-SEP-09
released channel: t1
released channel: t2

RMAN> 

Recovery Manager complete.
where xxxxxxxxx+1 represent SCN you are searching for.

Cheers!

20 comments :

  1. Very nice. So clear and easy to understand the main point. Appreciate a lot.
    Leika

    ReplyDelete
  2. Thank you Leika.
    Glad you'd find it useful.

    ReplyDelete
  3. When I run a RESTORE PREVIEW, it ends with:

    Media recovery start SCN is 2483662717
    Recovery must be done beyond SCN 2484364327 to clear datafile fuzziness
    Finished restore at 25-MAY-2011 22:03:08

    Why are the SCNs different here?
    Since this was a one off special bkup taken to disk (not tsm which is typical), how do I verfiy I have all my archive logs on disk before I begin restore & recovery?

    ReplyDelete
  4. How do you obtain SCN in your post?

    ReplyDelete
  5. "Media recovery start SCN is 2483662717"
    This is starting SCN
    When backup was taking place, SCN are changed so when backup ends, then SCN was "2484364327".
    This is the number of SCN that you must have in archived redo logs to have valid backup.

    ReplyDelete
  6. Thank you. This post helped me do a successful restore.

    ReplyDelete
  7. Appreciated your efforts!! Thanks A lot for writing this block.

    ReplyDelete
  8. one question, do we need all the time to be increased the scn by 1 , suppose if the scn is 12345 while recovering the db should it be 12346? also, what about if I go through the recovery using time stamp?

    ReplyDelete
  9. Hi, can you explain how you got to these two tables/views?
    FROM x$kccfe fe,
    x$kccfn fn

    ReplyDelete
    Replies
    1. searching internet .... and looking on many places...

      Delete
  10. Hi Damir, what about this scenario: We have a level 0 backup that starts on day 1 at 6:00 am and runs for over 30 hours, ending on day 2 at 13:00, during the time no archive log backups are taken but the LV0 includes archive log all and deletes them.

    Now, we need to restore the DB back to day 1 at 14:00, obviously this LV0 would not help since it had not finished by then; but we do have previous LV0, previous LV1 and all archive logs to roll forward.

    Question is: what is the best place to find the exact SCN to use here (the closest to day 1 14:00+), we don't have much info on the catalog because no archive logs were backed up during the LV0, so the RC_ARCHIVE_LOG view does not help much, we could use the v$log_history and list backup, and a few other places, but is there a better and more accurate way? I like the idea of using the x$ tables, although even querying them might be dangerous sometimes, but probably more accurate.

    Any thoughts and guidance would be very much appreciated, you info here is great and I have enjoyed reading it very much. Keep up the good work.

    Thanks

    Orlando.

    ReplyDelete
    Replies
    1. Your restore must begin with backup from time when backup completed BEFORE restore point. From that moment, use archived logs (probably from both backups) and come to desired restore moment.
      To define restore, you do not have to use SCN but also timestamp as well. here are some examples of that:
      SET UNTIL SCN 1000;
      # Alternatives:
      SET UNTIL TIME 'Nov 15 2004 09:00:00';
      SET UNTIL SEQUENCE 9923;
      If you want to stay in SCN terminology, look for TIMESTAMP_TO_SCN function (http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions176.htm)
      Hope this helps,
      Damir

      Delete
    2. Thanks for the quick response Damir, yes this helps. Now, using until time might have some issues due to “Bug 11694127” which might truncate the time part of the date, in fact the workaround Oracle offers (Doc ID 11694127.8) does not work consistently, I have tested that a few times.

      Due to that, yes I always use SCN for all my restores and duplicates; also, the TIMESTAMP_TO_SCN does not work all the time either, because as you know, some of the SCN’s returned might not be related to a log switch but for some other checkpoint activity, so I have seen RMAN trying to honor an SCN and no finding proper logs to apply, although all logs are there.

      Finally, the sequence values also help a lot, but when dealing with RAC environments, it is a bit confusing for some people because each instance normally have a different set of numbers and on top of that they also have multiple treads so it is not that easy to come up with the exact SCN.

      What we are trying to do is to have a very solid and bullet proof method and hopefully a small script that will take a time stamp as parameter and return that exact SCN we need every single time, whether we use a catalog or just control file.

      I have a very simple script that works most of the times, but for the long running scenario I described earlier, it might not be provide what we need and it only works with catalogs. As soon as I come up with a better one I’ll let you know.

      Nice talking to you again.

      Delete
    3. Thanks for sharing this info. Looking forward to see that script-appreciated in front.
      But now come to mine mind...
      If you want something bullet proof restore methods with SCN, maybe, just maybe, idea to place some logging activities in PLSQL which will save data to plain file (with action name, time and SCN ... and thread as well in RAC case ... anything you need), before any interesting activity over any kind of batch or mass processing.
      This would be really huge help to get proper values for any restore in the future.
      Hope this helps,
      Damir

      Delete
    4. Here is an example how to create autonomous backup,where you can see how to gather all important values youu need for restore and cover RAC version
      http://damir-vadas.blogspot.com/2009/10/autonomous-rman-online-backup.html
      Cheers
      Damir

      Delete
    5. Good point, in fact I’ve been thinking of incorporating “Restore Points” as part of my backups, without using flashback. Of course that would not work for my initial scenario, but it’ll reduce de guessing for most of the cases we normally face day to day.
      Orlando.

      Delete
  11. Hi, I was running backup to disk but now I want to switch to SBT_TAPE. I have run a full database backup to my SBT device using the following command: backup as backupset full check logical database
    Then I am trying to restore from the SBT backup but I am getting errors "RMAN-06054: media recovery requesting unknown archived log for thread...".
    Why is it looking for archived logs?
    From the controlfile information I have noticed that are datafiles with different Ckp SCN. For instance, in BS 19160 the datafiles have the Ckp SCN = 12489436489235, and for the BS 19161 the Ckp SCN is 12489436489236 (lastest BS).
    I am running the RESTORE/RECOVERY using the UNTIL SCN=12489436489237 (12489436489236 + 1).

    ReplyDelete
    Replies
    1. is this RAC database?
      If yes then every node has own thread.

      Delete
  12. sooper explanation. Still helped me.

    ReplyDelete

Zagreb u srcu!

Copyright © 2009-2014 Damir Vadas

All rights reserved.


Sign by Danasoft - Get Your Sign