The Importance of Bounded Recovery
Bounded Recovery is a component of Oracle GoldenGate’s Extract process checkpointing
facility. It guarantees an efficient recovery after Extract stops for any
reason, planned or unplanned, no matter how many open (uncommitted)
transactions there were at the time that Extract stopped, nor how old they
were. Bounded Recovery sets an upper boundary for the maximum amount of time
that it would take for Extract to recover to the point where it stopped and
then resume normal processing.
Extract performs this recovery as follows:
·
If there were no open transactions when Extract stopped, the
recovery begins at the current Extract read checkpoint. This is a normal
recovery.
·
If there were open transactions whose start points in the
log were very close in time to the time when Extract
stopped, Extract begins recovery by re-reading the logs from the beginning of
the oldest open transaction. This requires Extract to do redundant work for
transactions that were already written to the trail or discarded before Extract
stopped, but that work is an acceptable cost given the relatively small amount
of data to process. This also is considered a normal recovery.
·
If there were one or more transactions that Extract
qualified as long-running open transactions, Extract
begins its recovery with a Bounded Recovery.
Bounded Recovery is new feature in OGG 11.1, this is how it
works:
A transaction qualifies as long-running if it has been open
longer than one Bounded Recovery interval, which is specified with the
BRINTERVAL option of the BR parameter.
For example, if the Bounded Recovery interval is four hours,
a long-running open transaction is any transaction that started more than four
hours ago.
At each Bounded Recovery interval, Extract makes a Bounded
Recovery checkpoint, which persists the current state and data of Extract to
disk, including the state and data (if any) of long-running transactions. If
Extract stops after a Bounded Recovery checkpoint, it will recover from a
position within the previous Bounded Recovery interval or at the last Bounded
Recovery checkpoint, instead of processing from the log position where the
oldest open long-running transaction first appeared, which could be several
trail files ago.
Bounded Recovery is enabled by default for Extract processes
and has a 4 hour BR interval. To adjust the BR interval to say 24 hours, use
the following syntax in your Extract parameter file:
BR BRINTERVAL 24, BRDIR
BR
The default location for BR checkpoint files is the GoldenGate home directory. This can be altered by including
a full path:
BR BRINTERVAL 24, BRDIR
/ggsdata/brcheckpoint
Case Study
The Problem
In a recent case, Bounded Recovery was disabled through the
following Extract parameter:
BR BROFF
Consequently the following behavior prevented the Extract process
from recovering and starting.
1.
Firstly, GoldenGate had fallen
behind due to a batch job and subsequently the Extract process was reading the
archived redologs and not the online redologs. Also at this time an archived redolog
was deleted by RMAN during a scheduled backup, that caused the Extract process
to abend with OGG-00446 (caused by ORA-15173)
Error in ggserr.log
2012-07-04 11:03:03 ERROR
OGG-00446 Oracle GoldenGate Capture for Oracle, euktds01.prm: Getting attributes for ASM file
+FRA/2_86717_716466928.dbf, SQL <BEGIN dbms_diskgroup.getfileattr('+FRA/2_86717_716466928.dbf',
:filetype, :filesize, :lblksize); END;>: (15056) ORA-15056: additional error
message ORA-15173: entry '2_86717_716466928.dbf' does not exist in directory
'/' ORA-06512: at "SYS.X$DBMS_DISKGROUP", line 304 ORA-06512: at line
1Not able to establish initial position for sequence 86717, rba
122140688.
2012-07-04 11:03:03 ERROR
OGG-01668 Oracle GoldenGate Capture for Oracle, euktds01.prm: PROCESS ABENDING.
2.
Some hours later, the deleted archived redolog
file was restored and the Extract process restarted. However, despite the
process running, the RBA# and Sequence# were not incrementing. The Extract
process was stuck!
The INFO GGSCI command with
DETAIL option revealed the source redo was not available.
GGSCI (dbserver09a) 2> info EUKMDS01,
detail
Extract Source
Begin End
Not Available
2012-07-04 23:30
2012-07-04 23:30
Not Available
2012-07-04 23:28
2012-07-04 23:30
Not Available
2012-07-01 05:35
2012-07-04 23:28
+DATA/ukhub/onlinelog/group_4.282.716467031 2012-06-24
05:28 2012-07-01 05:35
+DATA/ukhub/onlinelog/group_3.280.716467027 2012-06-23
21:06 2012-06-24 05:28
3.
The ggserr.log also
revealed a long running transaction detected.
2012-07-04 23:31:47 WARNING OGG-01027 Oracle GoldenGate
Capture for Oracle, euko1els.prm: Long
Running Transaction: XID 197.8.3521317, Items 0, Extract EUKO1ELS, Redo Thread
2, SCN 51.3925309013 (222968641109), Redo Seq #86717,
Redo RBA 122140688.
The Solution
The Extract process was
stuck in recovery mode, but could not find the starting RBA. In order to get
the process up and running, the following steps were executed on the source
system.
1.
First of all, the Extract
process was stopped with the force option.
GGSCI (dbserver09a) 4> send extract
EUKMDS01, forcestop
2.
The start position of the
Extract process was altered to the beginning of the long running transaction.
GGSCI (dbserver09a) 5> alter extract
EUKMDS01, begin 2012-07-04 23:31:47
3.
The extract process was
started.
GGSCI (dbserver09a) 4> start extract
EUKMDS01
4.
Sure enough, the Extract process
was reinitialized and continued to process the backlog.
GGSCI (uklpdptoy09a) 2> info
EUKMDS01, detail
Extract Source
Begin End
+DATA/ukhub/onlinelog/group_4.282.716467031 2012-07-04
23:31 2012-07-05 02:58
Not Available
* Initialized * 2012-07-04 23:31
Not Available
2012-07-04 23:30
2012-07-04 23:30
Conclusion
Never disable Bounded
Recovery else Extract processes may fail to recover automatically. Furthermore,
to prevent RMAN from deleting archived log files that are still required. If
you register the extract with LOGRETENTION then the GoldenGate
will retain the archive logs that Extract needs for recovery.
To register Extract do the following:
1.
Stop the Extract ( Ensure that all the archive
log files starting from recovery checkpoint till current checkpoint is
available on all nodes )
2.
Execute the following
GGSCI commands
GGSCI> dblogin
userid <username>, password <password>
GGSCI> register extract
<Extract-name>, LOGRETENTION
You can confirm whether Extract
is registered or not using the query “select * from dba_capture”.
(This sounds like Streams!) This
should have an entry for Extract.
3.
Start the Extract
GGSCI>start extract <Extract-name>