Database Administration: Standby Database Recovery

Problem Description:- One of the perticular archive log which is shipped from primary having the problem In alert log clearly mentioned that archive log does not contain any redo. MRP process while applying the that archive log on standby database that MRP background process terminated its self and shutdown. From that point onward in standby Recovery stopped due to that corrupted

Archive log and In alert log Clearly Mentioned Error Occurred while applying redo to database block file#183 and Block # 159382. File #183 is the One of the datafile in tablespace TS_DATA_JOBHISTORY.

On Primary :-
***********

SQL> Select name from v$datafile where file#=183;

NAME
--------------------------------------------------------------------------------
+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.614.737855891

Incident Date : Jan 31 14:38:42 2012
*************

Information on Standby database From Alert Log :-
**********************************************

Errors in file /u01/app/oracle/admin/MQ1D4PSB2/bdump/mq1d4p1_p003_457.trc
ORA-00339: archived log does not contain any redo
ORA-00334: archived log: '+DEP_DATA01/mq1d4psb2/onlinelog/group_12.631.737880099'
Errors in file /u01/app/oracle/admin/MQ1D4PSB2/bdump/mq1d4p1_p003_457.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [183], [159382], [6126], [], [], [], []
Tue Jan 31 14:38:45 2012
Errors in file /u01/app/oracle/admin/MQ1D4PSB2/bdump/mq1d4p1_p003_457.trc:
ORA-10562: Error occurred while applying redo to data block (file# 183, block# 159382)
ORA-10564: tablespace TS_DATA_JOBHISTORY
ORA-01110: data file 183: '+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.614.737855891'
ORA-10561: block type 'TRANSACTION MANAGED DATA BLOCK', data object# 164296
ORA-00600: internal error code, arguments: [kddummy_blkchk], [183], [159382], [6126], [], [], [], []
Tue Jan 31 14:38:47 2012
Errors with log +DEP_ARCH01/mq1d4psb2/archivelog/2012_01_31/thread_2_seq_114202.478.774023785
MRP0: Background Media Recovery terminated with error 12801
Tue Jan 31 14:38:47 2012
Errors in file /u01/app/oracle/admin/MQ1D4PSB2/bdump/mq1d4p1_mrp0_439.trc:
ORA-12801: error signaled in parallel query server P003, instance mq1-racs-dep-01.speechmachines.biz:MQ1D4P1 (1)
ORA-10562: Error occurred while applying redo to data block (file# 183, block# 159382)
ORA-10564: tablespace TS_DATA_JOBHISTORY
ORA-01110: data fi
Recovery interrupted!
Tue Jan 31 14:39:49 2012
Recovered data files to a consistent state at change 1015334776093
Tue Jan 31 14:39:50 2012
Errors in file /u01/app/oracle/admin/MQ1D4PSB2/bdump/mq1d4p1_mrp0_439.trc:
ORA-12801: error signaled in parallel query server P003, instance mq1-racs-dep-01.speechmachines.biz:MQ1D4P1 (1)
ORA-10562: Error occurred while applying redo to data block (file# 183, block# 159382)
ORA-10564: tablespace TS_DATA_JOBHISTORY
ORA-01110: data fi
Tue Jan 31 14:39:50 2012
MRP0: Background Media Recovery process shutdown (MQ1D4P1)

Problem resolutions :- They are two solutions
*********************

Solution 1) Copy the Archive log which is having from primary to standby and start managed recovery on Standby

Solution 2) Copy the datafile from primary to standby which is having the problem and apply archivelogs

In our case we tested Solution 1 which not resolved the problem. So we went to Solution 2 which is fixed the problem.

Solution 2) Copy the datafile from primary to standby which is having the problem and apply archivelogs

Before Proceeding this According Oracle support suggesstion Increasing Log ARchive Max Process on Primary & Standby. Becuase currently setting up

on both primary & standby is log_archive_max_processes=5 which is too low.

You can set it through alter system, on the primary:

SQL> ALTER SYSTEM SET log_archive_max_processes=10 Scope=both;

on the standby, since it is servicing four primary instances.

SQL> ALTER SYSTEM SET log_archive_max_processes=15 scope=both;

On Primary:-
**********

Step 1) Copy The datafile & archivelog from primary database to standby database which is having the problem

Step 1) Copy the Datafile from ASM to Normal Filesystem on primary

RMAN>copy datafile '+DEP_DATA01/mq1d4prd/datafile/ts_data_jobhistory.724.736593745' to '/u02/oracle/backups/ts_data_jobhistory01.dbf';

Step 2) Copy the Archivelog from ASM to Normal Filesystem on primary

RMAN> copy archivelog '+DEP_ARCH01/mq1d4prd/archivelog/2012_02_01/thread_2_seq_114202.18926.774139397' to '/u02/oracle/backups/thread_2_seq_114202.arc';

Step 3) Move all copied files to Standby server thorough SCP on any of the location

scp /u02/oracle/backups/thread_2_seq_114202.arc oracle@mq1-racs-dep-01:/u02/oracle/backups/archivelogs

On Standby:-
**********

Step 1) stop the Recovery on Standby

alter database recover managed standby database cancel;

Step 1) catalog the datafile which is moved on Standby server

RMAN> catalog datafilecopy '/u02/oracle/backups/archivelogs/ts_data_jobhistory01.dbf';

Step 2) Now Copy to above file to ASM.

Rman > Connect target

RMAN> copy datafilecopy '/u02/oracle/backups/archivelogs/ts_data_jobhistory01.dbf' to '+DEP_DATA01';

Starting backup at 01-FEB-2012 21:46:49
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=192 instance=MQ1D4P1 devtype=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: sid=614 instance=MQ1D4P1 devtype=DISK
channel ORA_DISK_1: starting datafile copy
input is copy of datafile 00183: /u02/oracle/backups/archivelogs/ts_data_jobhistory01.dbf
output filename=+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.394.774136015 tag=TAG20120201T145432 recid=4207 stamp=774136119
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:01:57
Finished backup at 01-FEB-2012 21:48:49

Starting Control File and SPFILE Autobackup at 01-FEB-2012 21:48:49
piece handle=/u02/oracle/backups/MQ1D4P/archivelog/MQ1D4P_c-2932372620-20120201-05.CTL.bak comment=NONE
Finished Control File and SPFILE Autobackup at 01-FEB-2012 21:48:54

Step 3) Get the file# for the datafile what we are replacing in Standy. Run below command on Standby server

RMAN> list copy of datafile 183;

List of Datafile Copies
Key File S Completion Time Ckp SCN Ckp Time Name
------- ---- - -------------------- ---------- -------------------- ----
4207 183 A 01-FEB-2012 21:48:39 1015533842632 01-FEB-2012 14:54:32 +DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.394.774136015 ------> This is the one Which is Moved to ASM from File system After Copied from Primary
4206 183 A 01-FEB-2012 21:46:20 1015533842632 01-FEB-2012 14:54:32 /u02/oracle/backups/archivelogs/ts_data_jobhistory01.dbf ------> This is the one original copy before moved to ASM after moving from PRimary ( Transient Copy)

Step 4) Go to sqlplus on standby database

SQL>Select name from v$datafile where file#=183;

NAME
--------------------------------------------------------------------------------
+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.614.737855891

Step 5) However we would need to set standby_file_management=manual temporarily for this operation.

SQL>Alter system set standby_file_management=manual scope=both ;

Step 6) We would need to rename this to the new file name show from above rman command

SQL> alter database rename file '+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.614.737855891' to '+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.394.774136015';

+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.614.737855891 ===> Name from Standy database step 5.

+DEP_DATA01/mq1d4psb2/datafile/ts_data_jobhistory.394.774136015 ====> Name file name show from above rman command Step 2 (or) Step 3

Step 6) parameter standby_file_management is set back to auto.

SQL> Alter system set standby_file_management=auto scope=both ;

Step 7) After the activity is over, remove the transient datafile copy i.e which is shown in Step 3

RMAN> delete datafilecopy '/u02/oracle/backups/archivelogs/ts_data_jobhistory01.dbf';

Step 8) Start recovery on Standby database for manually for applying probamatic archivelog;

SQL> recover standby database until cancel;

supply the path of Archive logs which we moved from primary to standby then stop recovery mode and start automatic managed recovery through below command.

SQL> alter database recover managed standby database disconnect;

Step 9) It will start recovery mode keep monitor alerts it will ask you if any archive logs missed or deleted on standby database.

For Example:

In alert log of standby:-

FAL[client]: Failed to request gap sequence
GAP - thread 2 sequence 114204-114215
DBID 2932372620 branch 624923210

Step 1) Above archive logs doesnt not exists on standby for some reason ( deleted through daily jobs) Now Just restore those archive logs on primary.

Once restore done those arhive logs automatically ship to standby. Once reached those archive logs to standby MRP process will take responsibilty

to apply and recover.

On Primary:-
**********

RMAN> restore archivelog from sequence 114204 until sequence 114215 thread 2;

Finally after applied all archive logs standby is current with primary.

Database Administration

Monday, 13 February 2012

Standby Database Recovery

1 comment: