一套HP-UX Itanium平台上的10.2.0.2,实例意外终止,维护人员尝试重启实例,在数据库打开后数秒,smon后台进程报ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110]错误,实例再次意外终止。
部分告警日志如下:
Corrupt Block Found
TSN = 50, TSNAME = TS_DNI_AAL_12
RFN = 120, BLK = 849708, RDBA = 504166188
OBJN = 701796, OBJD = 701796, OBJECT = MAP_WOL_SILJUK, SUBOBJECT =
SEGMENT OWNER = DBOWN, SEGMENT TYPE = Table Segment
Corrupt Block Found
TSN = 50, TSNAME = TS_DNI_AAL_12
RFN = 121, BLK = 897927, RDBA = 508408711
OBJN = 701796, OBJD = 701796, OBJECT = MAP_WOL_SILJUK, SUBOBJECT =
SEGMENT OWNER = DBOWN, SEGMENT TYPE = Table Segment
Mon May 4 19:38:19 2009
Errors in file /oracle/admin/TDAY2DB/udump/tday2db_ora_2080.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [121], [897927], [6110], [], [], [], []
Mon May 4 19:38:19 2009
Errors in file /oracle/admin/TDAY2DB/udump/tday2db_ora_2077.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [849708], [6110], [], [], [], []
Mon May 4 21:58:55 2009
Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0
Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log
Block recovery completed at rba 90582.42.16, scn 1858.566253676
ORACLE Instance TDAY2DB (pid = 22) - Error 81 encountered while recovering transaction (14, 22) on object 701796.
Mon May 4 21:58:55 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc:
ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], []
Mon May 4 21:58:55 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc:
ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable
ORA-00081: address range [0x60000000000BD230, 0x60000000000BD234) is not readable
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], []
Mon May 4 21:58:57 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_p020_17710.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], []
Mon May 4 21:58:58 2009
Doing block recovery for file 120 block 856039
Block recovery from logseq 90582, block 37 to scn 7980615489643
Mon May 4 21:58:58 2009
Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0
Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log
Block recovery completed at rba 90582.42.16, scn 1858.566253676
Mon May 4 21:58:58 2009
SMON: Restarting fast_start parallel rollback
Mon May 4 21:58:58 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_p000_17661.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], []
Mon May 4 21:58:59 2009
Doing block recovery for file 120 block 856039
Block recovery from logseq 90582, block 37 to scn 7980615489643
Mon May 4 21:58:59 2009
Recovery of Online Redo Log: Thread 1 Group 5 Seq 90582 Reading mem 0
Mem# 0: /dev/vx/rdsk/day2db1tdg03/redo05.log
Block recovery completed at rba 90582.42.16, scn 1858.566253676
Mon May 4 21:58:59 2009
SMON: ignoring slave err,downgrading to serial rollback
Mon May 4 21:59:00 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_smon_17651.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [120], [856039], [6110], [], [], [], []
Mon May 4 21:59:09 2009
Errors in file /oracle/admin/TDAY2DB/bdump/tday2db_pmon_17633.trc:
ORA-00474: SMON process terminated with error
Mon May 4 21:59:09 2009
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 17633
当Oracle进程在读取数据块时会做一系列逻辑检测,当发现块当中存在逻辑讹误就会触发该ORA-00600 [kddummy_blkchk]等内部错误;[kddummy_blkchk]内部函数的功能大致与[kdBlkCheckError]相仿,它们都有3个参数argument:
ORA-600 [kddummy_blkchk] [file#] [block#] [check code]
ORA-600 [kdBlkCheckError] [file#] [block#] [check code]
file#即问题块所在datafile的文件号,block#即问题块的块号,check code为发现逻辑讹误时的检测种类代码;我们也可以通过file#和block#查找到存在问题的对象,譬如这个case中的file#为120,block#为856039,检查种类代码为6110:
Select segment_name,segment_type,owner from dba_extents where file_id=120 and 856039 between block_id and block_id + blocks -1;
当然以上查询是建立在我们能够打开数据库的前提下的,针对由ORA-600[[kddummy_blkchk]或[kdBlkCheckError]引起的实例意外终止及启动实例失败等现象,我们可以通过修改db_block_checking和db_block_checksum 2个参数为false,来阻止Oracle进程对数据块的一些逻辑检测工作:
SQL> alter system set db_block_checking=false;
System altered.
SQL> alter system set db_block_checksum=false;
System altered.
以上参数能够一定程度上规避ORA-600[kddummy_blkchk]或[kdBlkCheckError]的出现;但因为10g中隐式参数_db_always_check_system_ts控制了Oracle是否对system表空间上的对象进行block check和checksum(_db_always_check_system_ts:Always perform block check and checksum for System tablespace),且该隐式参数默认为TRUE;因此你还是有一定概率无法打开数据库,如遇此类argument[a]对应为system表空间的ORA-600[kddummy_blkchk]内部错误,可以尝试使用10513来进一步阻止数据库打开后smon进程的事务恢复(transaction recovery)行为:
SQL> alter system set event='10513 trace name context forever,level 2' scope=spfile;
System altered.
SQL> shutdown immediate;
ORA-01507: database not mounted
ORACLE instance shut down.
SQL> startup ;
ORACLE instance started.
Total System Global Area 2634022912 bytes
Fixed Size 2086288 bytes
Variable Size 2399144560 bytes
Database Buffers 218103808 bytes
Redo Buffers 14688256 bytes
Database mounted.
Database opened.
SQL> show parameter event
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
event string 10513 trace name context forever,level 2
在这个案例中我们尝试设置db_block_checksum 和 db_block_checking为false,打开了数据库,并进一步对存在问题块的表执行了导出导入的工作,最终解决了问题。
10g中默认参数db_block_checksum为TRUE,所以建议你在解决类似问题后,将该参数还原。