The scenario
To describe starting situation, let us suppose this restore/recover scenario (we are talking about incomplete recover).- startup nomount
- restore control file
- alter database mount
- restore database
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of alter db command at 01/11/2010 12:08:19 ORA-01152: file 1 was not restored from a sufficiently old backup ORA-01110: data file 1: '+DATA/hac/datafile/system.366.708004385'So we are stacked because recover could not be consistent. The main problem is that restored data files need ALL to be recovered at least to largest SCN number+1.
This situation is explained world wide but main main question is: what is the lowest SCN that can recover database enough to be open?
Solution-RMAN list backup
First solution is to look in RMAN repository or use control file records to retrieve backup of datafile 1. In a case of RMAN catalog you'll not have too much trouble because you'll find enough information to see backup of wanted datafile and it's SCN. Output may looks like:RMAN> list backup of datafile 1; List of Backup Sets =================== BS Key Type LV Size Device Type Elapsed Time Completion Time ------- ---- -- ---------- ----------- ------------ --------------- 645 Full 127.30M DISK 00:01:02 06-DEC-09 BP Key: 645 Status: AVAILABLE Compressed: YES Tag: TAG20091206T183018 Piece Name: +BCKP/hac/backupset/2009_12_06/tag20091206t183018_0.7966.704917821 List of Datafiles in backup set 645 File LV Type Ckp SCN Ckp Time Name ---- -- ---- ---------- --------- ---- 1 Full 35728924 06-DEC-09 +DATA/hac/datafile/system.282.694017165 ...If you are lucky and get result look in column "Ckp SCN" and you'll get missing SCN. Unfortunately this may not help you always.
In nocatalog mode it is more then likely that your command may retrieve no data:
RMAN> list backup of datafile 1; RMAN>No backup! The reason is no backup was taken in time provided with control_file_record_keep_time parameter.
Solution-RMAN list copy
Next solution is usage of RMAN's "list copy", which shows only backup of "datafile 1", with the same retention as described in previously explained in "List backup" solution. However if you are lucky and get result it is more or less like:RMAN> list copy of datafile 1; List of Datafile Copies Key File S Completion Time Ckp SCN Ckp Time Name ------- ---- - --------------- ---------- --------------- --------------------------------------------- 1917 1 A 08-JAN-10 49152352 08-JAN-10 /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_nvl2urjr 1867 1 X 07-JAN-10 49125664 07-JAN-10 /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_m4l2sii8 1841 1 X 07-JAN-10 49117957 07-JAN-10 /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_l7l2sb2u 1813 1 X 18-SEP-09 9643175 18-SEP-09 /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_edkpitiv 1666 1 A 17-SEP-09 9174432 17-SEP-09 /u01/HAC_I-1799195971_TS-SYSTEM_FNO-1_ahkpfgq1
Again, regardless shown several SCNs this might not be enough to help you.
Solution-use x$ (C based) tables
x$ tables are Oracle's core tables based on pure C data and should be handle with additional care. The beauty of them is they are available when database in mounted and when all "dba_*" views are not available as well as some v$ based ones.To see status of all restored datafiles can be done with this SQL
set linesize 200; set pagesize 100; col inst_id for 9999999 heading 'Instance #' col file_nr for 9999999 heading 'File #' col file_name for A50 heading 'File name' col checkpoint_change_nr for 99999999999999 heading 'Checkpoint #' col checkpoint_change_time for A20 heading 'Checkpoint time' col last_change_nr for 99999999999999 heading 'Last change #' SELECT fe.inst_id, fe.fenum file_nr, fn.fnnam file_name, TO_NUMBER (fe.fecps) checkpoint_change_nr, fe.fecpt checkpoint_change_time, fe.fests last_change_nr, DECODE ( fe.fetsn, 0, DECODE (BITAND (fe.festa, 2), 0, 'SYSOFF', 'SYSTEM'), DECODE (BITAND (fe.festa, 18), 0, 'OFFLINE', 2, 'ONLINE', 'RECOVER') ) status FROM x$kccfe fe, x$kccfn fn WHERE ( (fe.fepax != 65535 AND fe.fepax != 0 ) OR (fe.fepax = 65535 OR fe.fepax = 0) ) AND fn.fnfno = fe.fenum AND fe.fefnh = fn.fnnum AND fe.fedup != 0 AND fn.fntyp = 4 AND fn.fnnam IS NOT NULL AND BITAND (fn.fnflg, 4) != 4 ORDER BY fe.fenum ;The output might looks like:
Instance # File # File name Checkpoint # Checkpoint time Last change # STATUS ---------- -------- -------------------------------------------------- --------------- -------------------- ---------------- ---------------------------- 1 1 /db/SOME_DB/dbf01/system01.dbf 12039685200 03/10/2021 17:18:23 SYSTEM 1 2 /db/SOME_DB/dbf01/sysaux01.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 3 /db/SOME_DB/und_01/undotbs01.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 4 /db/SOME_DB/dbf01/users01.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 5 /db/SOME_DB/dbf01/gen_gfc_prod01.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 6 /db/SOME_DB/dbf01/gen_gfc_prod02.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 7 /db/SOME_DB/dbf01/gen_gfc_prod03.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 8 /db/SOME_DB/dbf01/gen_gfc_prod04.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 9 /db/SOME_DB/dbf01/gen_gfc_prod05.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 10 /db/SOME_DB/dbf01/gen_gfc_prod06.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 11 /db/SOME_DB/dbf01/gen_gfc_prod07.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 12 /db/SOME_DB/dbf01/sysaud01.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 13 /db/SOME_DB/dbf01/gen_gfc_prod08 9340774962 01/04/2021 11:04:30 9341443930 RECOVER 1 14 /db/SOME_DB/dbf01/gen_gfc_prod08.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 15 /db/SOME_DB/dbf01/gen_gfc_prod09.dbf 12039685200 03/10/2021 17:18:23 ONLINE 1 16 /db/SOME_DB/dbf01/gen_gfc_prod10.dbf 12039685200 03/10/2021 17:18:23 ONLINE 16 rows selected.As you see File# 13 needs recovery and last applied change was in SCN=9341443930. If you cannot retrive exact time (scn_to_timestamp), here you have in column Checkpoint time date value which you can uses as well.
Remember that SCN for recover is always xxxxxxxx+1, for one bigger then one found in list!
To cover the whole story, for successful recover of any backup set like that, you need to apply archive logs which's SCN's are between min(xxxxxxxx)-max(xxxxxxxx)+1 but you do not need to specify min(xxxxxxxx) because RMAN will know that automatically.
So your restore should use until SCN clause in RMAN script. In pseudo language that looks like:
startup nomount; restore controlfile; alter database mount; set until scn 9341443931 restore database; recover database; alter database open resetlogs;
If all SCNs from previous query are the same then (as they are in our case) then you are dealing with consistent (aka cold backup). Cold backup means that all datafiles are having same SCN so practicly they do not need recover but only as declarative term (you do not need any archived log file beside). In a cases like that, your restore RMAN script might looks like:
restore database; recover database noredo; alter database open resetlogs;
However, in some consistent backup cases, I remember that I found on some database version a bug as explained at the begginig (ORA-01152). Based on previously shown real numbers, your "full proof" restore script for consistent backup, should look like:
set until scn=9341443931; restore database; recover database noredo; alter database open resetlogs;So there is no affraid of use until clause even in restoring consistent backups. This is especially the case in "backup as copy" when you deal with backup outside of ASM storage.
Solution-RMAN restore database preview
To prevent searching for information in restore/recover time, there is elegant method that will tell you which SCN is needed for successful restore.After regular database backup, use MAN command which will collect and show all important data for this backupset.
backup database; restore database preview;At the end of command shows exact SCN that you need for successful recover. At the end of log file output is like:
Media recovery start SCN is 9341443930 Recovery must be done beyond SCN 9341443930 to clear data files fuzziness Finished restore at 17-SEP-21 released channel: t1 released channel: t2 RMAN> Recovery Manager complete.where xxxxxxxxx+1 represent SCN you are searching for.
Cheers!
Very nice. So clear and easy to understand the main point. Appreciate a lot.
ReplyDeleteLeika
Thank you Leika.
ReplyDeleteGlad you'd find it useful.
When I run a RESTORE PREVIEW, it ends with:
ReplyDeleteMedia recovery start SCN is 2483662717
Recovery must be done beyond SCN 2484364327 to clear datafile fuzziness
Finished restore at 25-MAY-2011 22:03:08
Why are the SCNs different here?
Since this was a one off special bkup taken to disk (not tsm which is typical), how do I verfiy I have all my archive logs on disk before I begin restore & recovery?
How do you obtain SCN in your post?
ReplyDelete"Media recovery start SCN is 2483662717"
ReplyDeleteThis is starting SCN
When backup was taking place, SCN are changed so when backup ends, then SCN was "2484364327".
This is the number of SCN that you must have in archived redo logs to have valid backup.
Thank you. This post helped me do a successful restore.
ReplyDeleteAppreciated your efforts!! Thanks A lot for writing this block.
ReplyDeleteone question, do we need all the time to be increased the scn by 1 , suppose if the scn is 12345 while recovering the db should it be 12346? also, what about if I go through the recovery using time stamp?
ReplyDeleteyes
DeleteHi, can you explain how you got to these two tables/views?
ReplyDeleteFROM x$kccfe fe,
x$kccfn fn
searching internet .... and looking on many places...
DeleteHi Damir, what about this scenario: We have a level 0 backup that starts on day 1 at 6:00 am and runs for over 30 hours, ending on day 2 at 13:00, during the time no archive log backups are taken but the LV0 includes archive log all and deletes them.
ReplyDeleteNow, we need to restore the DB back to day 1 at 14:00, obviously this LV0 would not help since it had not finished by then; but we do have previous LV0, previous LV1 and all archive logs to roll forward.
Question is: what is the best place to find the exact SCN to use here (the closest to day 1 14:00+), we don't have much info on the catalog because no archive logs were backed up during the LV0, so the RC_ARCHIVE_LOG view does not help much, we could use the v$log_history and list backup, and a few other places, but is there a better and more accurate way? I like the idea of using the x$ tables, although even querying them might be dangerous sometimes, but probably more accurate.
Any thoughts and guidance would be very much appreciated, you info here is great and I have enjoyed reading it very much. Keep up the good work.
Thanks
Orlando.
Your restore must begin with backup from time when backup completed BEFORE restore point. From that moment, use archived logs (probably from both backups) and come to desired restore moment.
DeleteTo define restore, you do not have to use SCN but also timestamp as well. here are some examples of that:
SET UNTIL SCN 1000;
# Alternatives:
SET UNTIL TIME 'Nov 15 2004 09:00:00';
SET UNTIL SEQUENCE 9923;
If you want to stay in SCN terminology, look for TIMESTAMP_TO_SCN function (http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions176.htm)
Hope this helps,
Damir
Thanks for the quick response Damir, yes this helps. Now, using until time might have some issues due to “Bug 11694127” which might truncate the time part of the date, in fact the workaround Oracle offers (Doc ID 11694127.8) does not work consistently, I have tested that a few times.
DeleteDue to that, yes I always use SCN for all my restores and duplicates; also, the TIMESTAMP_TO_SCN does not work all the time either, because as you know, some of the SCN’s returned might not be related to a log switch but for some other checkpoint activity, so I have seen RMAN trying to honor an SCN and no finding proper logs to apply, although all logs are there.
Finally, the sequence values also help a lot, but when dealing with RAC environments, it is a bit confusing for some people because each instance normally have a different set of numbers and on top of that they also have multiple treads so it is not that easy to come up with the exact SCN.
What we are trying to do is to have a very solid and bullet proof method and hopefully a small script that will take a time stamp as parameter and return that exact SCN we need every single time, whether we use a catalog or just control file.
I have a very simple script that works most of the times, but for the long running scenario I described earlier, it might not be provide what we need and it only works with catalogs. As soon as I come up with a better one I’ll let you know.
Nice talking to you again.
Thanks for sharing this info. Looking forward to see that script-appreciated in front.
DeleteBut now come to mine mind...
If you want something bullet proof restore methods with SCN, maybe, just maybe, idea to place some logging activities in PLSQL which will save data to plain file (with action name, time and SCN ... and thread as well in RAC case ... anything you need), before any interesting activity over any kind of batch or mass processing.
This would be really huge help to get proper values for any restore in the future.
Hope this helps,
Damir
Here is an example how to create autonomous backup,where you can see how to gather all important values youu need for restore and cover RAC version
Deletehttp://damir-vadas.blogspot.com/2009/10/autonomous-rman-online-backup.html
Cheers
Damir
Good point, in fact I’ve been thinking of incorporating “Restore Points” as part of my backups, without using flashback. Of course that would not work for my initial scenario, but it’ll reduce de guessing for most of the cases we normally face day to day.
DeleteOrlando.
Hi, I was running backup to disk but now I want to switch to SBT_TAPE. I have run a full database backup to my SBT device using the following command: backup as backupset full check logical database
ReplyDeleteThen I am trying to restore from the SBT backup but I am getting errors "RMAN-06054: media recovery requesting unknown archived log for thread...".
Why is it looking for archived logs?
From the controlfile information I have noticed that are datafiles with different Ckp SCN. For instance, in BS 19160 the datafiles have the Ckp SCN = 12489436489235, and for the BS 19161 the Ckp SCN is 12489436489236 (lastest BS).
I am running the RESTORE/RECOVERY using the UNTIL SCN=12489436489237 (12489436489236 + 1).
is this RAC database?
DeleteIf yes then every node has own thread.
sooper explanation. Still helped me.
ReplyDeleteGood Blog. Nicely written.
ReplyDeleteThank you
Delete