详述Redo日志中操作系统块头信息损坏的处理过程

墨墨导读：前几天某客户遇到这个问题：文件系统损坏导致Current redo log异常，最终恢复过程比较简单，这里不再累述。本文详述redo log os header block损坏的处理过程，希望对大家有帮助。

如下是简单测试过程：

SQL> select * from v$Log;
GROUP#    THREAD#  SEQUENCE#      BYTES  BLOCKSIZE    MEMBERS ARCHIV STATUS                           FIRST_CHANGE# FIRST_TIME   NEXT_CHANGE# NEXT_TIME        CON_ID
---------- ---------- ---------- ---------- ---------- ---------- ------ -------------------------------- ------------- ------------ ------------ ------------ ----------
1          1         21  209715200        512          1 NO     INACTIVE                              13356440 12-MAY-20        13356445 12-MAY-20             0
2          1         22  209715200        512          1 NO     CURRENT                               13356445 12-MAY-20      9.2954E+18                       0
3          1         19  209715200        512          1 NO     INACTIVE                              13356426 12-MAY-20        13356435 12-MAY-20             0
4          1         20  104857600        512          1 NO     INACTIVE                              13356435 12-MAY-20        13356440 12-MAY-20             0
SQL> select member from v$Logfile;
MEMBER
--------------------------------------------------
/opt/oracle/oradata/ENMOTECH/redo03.log
/opt/oracle/oradata/ENMOTECH/redo02.log
/opt/oracle/oradata/ENMOTECH/redo01.log
/opt/oracle/oradata/ENMOTECH/redo04.log
SQL> shutdown abort;
ORACLE instance shut down.
SQL> host
[oracle@mysqldb1 ~]$ cp /opt/oracle/oradata/ENMOTECH/redo02.log /opt/oracle/oradata/ENMOTECH/redo02.log.bak

这里我们分别dump一下redo log的os block：

[oracle@mysqldb1 ~]$ dd if=/opt/oracle/oradata/ENMOTECH/redo02.log bs=512 count=1 | od -x
1+0 records in
1+0 records out
512 bytes (512 B) copied, 4.6621e-05 s, 11.0 MB/s
0000000 2200 0000 0000 ffc0 0000 0000 0000 0400
0000020 9dc3 0000 0200 0000 4000 0006 7c7d 7a7b
0000040 0003 0000 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000
[oracle@mysqldb1 ~]$ dd if=/opt/oracle/oradata/ENMOTECH/redo01.log bs=512 count=1 | od -x
1+0 records in
1+0 records out
512 bytes (512 B) copied, 6.8299e-05 s, 7.5 MB/s
0000000 2200 0000 0000 ffc0 0000 0000 0000 0400
0000020 9dc3 0000 0200 0000 4000 0006 7c7d 7a7b
0000040 0003 0000 0000 0000 0000 0000 0000 0000
0000060 0000 0000 0000 0000 0000 0000 0000 0000
*
0001000

由于这2个redo log文件大小完全一致，因此dump内容完全一致。

这里我们针对上述内容进行简单解释：

22 ：表示file type；即logfile;如果为a2则表示是datafile

200: 转换为10进制为512，表示block size

4000 0006: 表示logfile size大小，单位是block；转换为10进制后卫4096000

7c7d 7b7a:表示mgiac number

0003: 表示file number

这里vi随便编辑输入一些内容，注意保证该文件大小必须为512 byte。

[oracle@mysqldb1 ~]$  dd if=/tmp/dd_redo_corrupt  of=/opt/oracle/oradata/ENMOTECH/redo02.log   bs=512 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000312707 s, 1.6 MB/s

启动数据库看情况：

SQL> startup
ORACLE instance started.
Total System Global Area 1157626160 bytes
Fixed Size                  9566512 bytes
Variable Size             671088640 bytes
Database Buffers          469762048 bytes
Redo Buffers                7208960 bytes
Database mounted.
ORA-03113: end-of-file on communication channel
Process ID: 85646
Session ID: 1 Serial number: 23142
2020-05-12T16:03:16.597704+08:00
Errors in file /opt/oracle/diag/rdbms/enmotech/enmotech/trace/enmotech_lgwr_85776.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/ENMOTECH/redo02.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 2
2020-05-12T16:03:16.597845+08:00
Errors in file /opt/oracle/diag/rdbms/enmotech/enmotech/trace/enmotech_lgwr_85776.trc:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/ENMOTECH/redo02.log'
ORA-27048: skgfifi: file header information is invalid
Additional information: 2
2020-05-12T16:03:16.601276+08:00
Errors in file /opt/oracle/diag/rdbms/enmotech/enmotech/trace/enmotech_ora_85852.trc:
ORA-00313: open failed for members of log group 1 of thread
ORA-00312: online log 2 thread 1: '/opt/oracle/oradata/ENMOTECH/redo02.log'
2020-05-12T16:03:16.690686+08:00
System state dump requested by (instance=1, osid=85852), summary=[abnormal instance termination].
System State dumped to trace file /opt/oracle/diag/rdbms/enmotech/enmotech/trace/enmotech_diag_85753.trc
USER (ospid: 85852): terminating the instance due to ORA error 313
2020-05-12T16:03:16.793750+08:00

如我们所想，报错完全一样，Oracle无法识别到这个redo logfile。

那么既然每个redo log文件的os block几乎都类似，能不能直接copy覆盖呢？答案是：当然可以。

尝试还原

首先将一个完好的redo log header 备份出来(注意确保redo 大小必须一致)：

[oracle@mysqldb1 ~]$ dd if=/opt/oracle/oradata/ENMOTECH/redo03.log of=/tmp/dd_good bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000187029 s, 2.7 MB/s

然后通过dd 进行还原:

[oracle@mysqldb1 ~]$  dd if=/tmp/dd_good  of=/opt/oracle/oradata/ENMOTECH/redo02.log   bs=512 count=1 conv=notrunc            
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.000256535 s, 2.0 MB/s
[oracle@mysqldb1 ~]$

启动数据库并进行验证

[oracle@mysqldb1 ~]$ sqlplus "/as sysdba"
SQL*Plus: Release 20.0.0.0.0 - Production on Tue May 12 16:06:03 2020
Version 20.2.0.0.0
Copyright (c) 1982, 2020, Oracle.  All rights reserved.
Connected to an idle instance.
SQL> startup
ORACLE instance started.
Total System Global Area 1157626160 bytes
Fixed Size                  9566512 bytes
Variable Size             671088640 bytes
Database Buffers          469762048 bytes
Redo Buffers                7208960 bytes
Database mounted.
Database opened.
SQL>
2020-05-12T16:06:19.563598+08:00
ALTER DATABASE OPEN
Ping without log force is disabled:
instance mounted in exclusive mode.
2020-05-12T16:06:19.580961+08:00
Crash Recovery excluding pdb 2 which was cleanly closed.
Endian type of dictionary set to little
2020-05-12T16:06:19.608797+08:00
Thread 1 advanced to log sequence 23 (thread open)
Redo log for group 3, sequence 23 is not located on DAX storage
Thread 1 opened at log sequence 23
Current log# 3 seq# 23 mem# 0: /opt/oracle/oradata/ENMOTECH/redo03.log
Successful open of redo thread 1
2020-05-12T16:06:19.625657+08:00
MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
stopping change tracking
2020-05-12T16:06:19.730666+08:00
TT00 (PID:86011): Gap Manager starting
2020-05-12T16:06:20.076465+08:00
Undo initialization recovery: Parallel FPTR failed: start:290504951 end:290504965 diff:14 ms (0.0 seconds)
Undo initialization recovery: err:0 start: 290504944 end: 290505032 diff: 88 ms (0.1 seconds)
[86009] Successfully onlined Undo Tablespace 2.
Undo initialization online undo segments: err:0 start: 290505037 end: 290505271 diff: 234 ms (0.2 seconds)
Undo initialization finished serial:0 start:290504944 end:290505281 diff:337 ms (0.3 seconds)
Database Characterset is AL32UTF8
2020-05-12T16:06:20.611451+08:00
No Resource Manager plan active
2020-05-12T16:06:21.606709+08:00
joxcsys_required_dirobj_exists: directory object exists with required path /opt/soft/javavm/admin/, pid 86009 cid 1
2020-05-12T16:06:21.690387+08:00
Starting background process RCBG
2020-05-12T16:06:21.765583+08:00
RCBG started with pid=45, OS id=86019
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process AQPC
2020-05-12T16:06:22.174256+08:00
AQPC started with pid=46, OS id=86021
2020-05-12T16:06:23.474703+08:00
PDB$SEED(2):Autotune of undo retention is turned on.
2020-05-12T16:06:23.505054+08:00
PDB$SEED(2):Endian type of dictionary set to little
PDB$SEED(2):Undo initialization finished serial:0 start:290508680 end:290508680 diff:0 ms (0.0 seconds)
PDB$SEED(2):Database Characterset for PDB$SEED is AL32UTF8
2020-05-12T16:06:24.881991+08:00
PDB$SEED(2):Opening pdb with no Resource Manager plan active
2020-05-12T16:06:25.222055+08:00
:
QPI: opatch file present, opatch
:
QPI: qopiprep.bat file present
2020-05-12T16:06:26.079591+08:00
KILLDB(3):Autotune of undo retention is turned on.
2020-05-12T16:06:26.113478+08:00
KILLDB(3):Endian type of dictionary set to little
KILLDB(3):Undo initialization recovery: Parallel FPTR failed: start:290511271 end:290511285 diff:14 ms (0.0 seconds)
KILLDB(3):Undo initialization recovery: err:0 start: 290511270 end: 290511340 diff: 70 ms (0.1 seconds)
KILLDB(3):[86009] Successfully onlined Undo Tablespace 2.
KILLDB(3):Undo initialization online undo segments: err:0 start: 290511340 end: 290511764 diff: 424 ms (0.4 seconds)
KILLDB(3):Undo initialization finished serial:0 start:290511270 end:290511786 diff:516 ms (0.5 seconds)
KILLDB(3):Database Characterset for KILLDB is AL32UTF8
2020-05-12T16:06:28.836526+08:00
KILLDB(3):Opening pdb with no Resource Manager plan active
KILLDB(3):joxcsys_required_dirobj_exists: directory object exists with required path /opt/soft/javavm/admin/, pid 86009 cid 3
Pluggable database KILLDB opened read write
2020-05-12T16:06:29.905303+08:00
Starting background process CJQ0
2020-05-12T16:06:29.979162+08:00
CJQ0 started with pid=52, OS id=86206
Completed: ALTER DATABASE OPEN

那么如果我数据库中的redo log大小不一致怎么办呢？

因为os block中有记录文件大小，大小不一致也没有关系，我们dd后编辑一下即可。

原文链接：http://www.killdb.com/2020/05/12/oracle_redo_log_corrupt/（复制到浏览器中打开或者点击“阅读原文”）

推荐阅读：144页！分享珍藏已久的数据库技术年刊

数据和云

ID：OraNews

如有收获，请划至底部，点击“在看”，谢谢！

点击下图查看更多 ↓

云和恩墨大讲堂 | 一个分享交流的地方

长按，识别二维码，加入万人交流社群

请备注：云和恩墨大讲堂

点个“在看”

你的喜欢会被看到❤

相关阅读:
快速幂
 某年元宵节大礼包矩阵快速幂
 HDU 3303 Harmony Forever 前缀和+树状数组||线段树
 HDU 4325 Flowers 树状数组+离散化
 11、【设计模式】构建器模式
 【基础】数据类型
 【Mybatis】Mybatis缓存
 【FTP】FTP(文件传输协议)工作原理（SFTP）
Docker是什么
 【RabbitMQ】使用RabbitMQ实现延迟任务
原文地址：https://www.cnblogs.com/hzcya1995/p/13311661.html