1.1现象

之前有个客户遇到一个问题,OGG同步数据链路,突然有一天网络出现问题,导致OGG投递进程无法正常投递,无法写入目标端的该文件。

猜测是由于网络丢包等原因导致文件损坏,无法正常open,read,write. 解决方法,投递进程etrollover。

本篇文档是基于这种方式测试下etrollover 【测试没有完美还原网络的问题,只是对其进行了测试】

1.2测试OGG进程restart与seqno有什么关系?

  1. 1)OGG 同步表及进程参数查看
    SQL> select * from dd;
  2. ID CC_NAME WITTIME
  3. ---------- ------------------------------ ------------------------------
  4. 2 2 03-JUN-20 02.34.37.000000 PM
  5.  
  6. GGSCI (t1) 4> view param exta
  7. extract exta
  8. USERID ogg,PASSWORD ogg
  9. EXTTRAIL /u01/ogg/base/dirdat/ea
  10. table YZ.DD;
  11.  
  12. GGSCI (t1) 5> view param dpea
  13. extract dpea
  14. rmthost 10.0.0.32,mgrport 7809, compress
  15. rmttrail /u01/ogg/base/dirdat/t1
  16. table YZ.B;
  17. table YZ.DD;
  18.  
  19. GGSCI (t1) 7> info exta
  20. EXTRACT EXTA Last Started 2020-11-10 11:05 Status RUNNING
  21. Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
  22. Process ID 10744
  23. Log Read Checkpoint Oracle Redo Logs
  24. 2020-11-10 11:25:54 Seqno 353, RBA 3917824
  25. SCN 0.3276594 (3276594)
  26. GGSCI (t1) 8> info dpea
  27. EXTRACT DPEA Last Started 2020-11-10 11:05 Status RUNNING
  28. Checkpoint Lag 00:00:00 (updated 00:00:09 ago)
  29. Process ID 10776
  30. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000067
  31. 2020-11-10 11:05:01.669087 RBA 1469
  32.  
  33. SQL> select * from dd;
  34. ID CC_NAME WITTIME
  35. ---------- ------------------------------ ------------------------------
  36. 2 2 03-JUN-20 02.34.37.000000 PM
  37. GGSCI (t2) 26> view param repa
  38. replicat repa
  39. userid ogg,password ogg
  40. assumetargetdefs
  41. HANDLECOLLISIONS
  42. discardfile /u01/ogg/base/dirrpt/repa.dsc
  43. MAP YZ.DD ,TARGET BAK_YZ.DD;
  44.  
  45. GGSCI (t2) 27> info repa
  46. REPLICAT REPA Last Started 2020-11-10 11:20 Status RUNNING
  47. Checkpoint Lag 00:00:00 (updated 00:00:09 ago)
  48. Process ID 11023
  49. Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000051
  50. 2020-11-10 11:05:01.313791 RBA 1563
  51.  
  52. 2)目标端OGG复制进程重启, 复制进程对应的trail 文件seq不变
  53. GGSCI (t2) 28> stop repa
  54. GGSCI (t2) 29> start repa
  55.  
  56. 3)源端OGG投递进程重启,投递进程对应的trail 文件seq不变
  57. GGSCI (t1) 9> stop dpea
  58. GGSCI (t1) 10> start dpea
  59. GGSCI (t1) 13> info dpea
  60. EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING
  61. Checkpoint Lag 00:00:00 (updated 00:00:04 ago)
  62. Process ID 11117
  63. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000067
  64. First Record RBA 1469
  65.  
  66. 4)源端OGG抽取进程重启,抽取进程对应的trail 文件seq +1
  67. GGSCI (t1) 15> info exta,detail
  68. EXTRACT EXTA Last Started 2020-11-10 11:05 Status RUNNING
  69. Checkpoint Lag 00:00:00 (updated 00:00:09 ago)
  70. Process ID 10744
  71. Log Read Checkpoint Oracle Redo Logs
  72. 2020-11-10 11:30:15 Seqno 353, RBA 3919360
  73. SCN 0.3276690 (3276690)
  74. Target Extract Trails:
  75. Trail Name Seqno RBA Max MB Trail Type
  76. /u01/ogg/base/dirdat/ea 67 1469 20 EXTTRAIL
  77. GGSCI (t1) 16> stop exta
  78. GGSCI (t1) 17> start exta
  79. Target Extract Trails:
  80. Trail Name Seqno RBA Max MB Trail Type
  81. /u01/ogg/base/dirdat/ea 68 1469 20 EXTTRAIL
  82. 5)源端抽取进程seq +1之后,源端投递进程读取的文件 seq +1, 投递进程写入目标端seq 文件+1 ,目标端复制进程读取的seq 文件+1
  83. GGSCI (t1) 19> info dpea
  84. EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING
  85. Checkpoint Lag 00:00:00 (updated 00:00:08 ago)
  86. Process ID 11117
  87. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068
  88. 2020-11-10 11:31:58.380185 RBA 1469
  89.  
  90. GGSCI (t2) 45> info repa
  91. REPLICAT REPA Last Started 2020-11-10 11:28 Status RUNNING
  92. Checkpoint Lag 00:00:00 (updated 00:00:02 ago)
  93. Process ID 11132
  94. Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
  95. 2020-11-10 11:31:58.035041 RBA 1563
  96.  
  97. 6)源端{确认OGG链路处于同步状态}
  98. SQL> insert into dd values(3,'cc',sysdate);
  99. SQL> commit;
  100. GGSCI (t1) 22> info dpea
  101. EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING
  102. Checkpoint Lag 00:00:00 (updated 00:00:00 ago)
  103. Process ID 11117
  104. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068
  105. 2020-11-10 11:34:52.000000 RBA 2284
  106.  
  107. 目标端
  108. SQL> select * from dd;
  109. ID CC_NAME WITTIME
  110. ---------- ------------------------------ ------------------------------
  111. 3 cc 10-NOV-20 11.34.50.000000 AM
  112. 2 2 03-JUN-20 02.34.37.000000 PM
  113.  
  114. REPLICAT REPA Last Started 2020-11-10 11:28 Status RUNNING
  115. Checkpoint Lag 00:00:00 (updated 00:00:04 ago)
  116. Process ID 11132
  117. Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
  118. 2020-11-10 11:34:51.656002 RBA 2378

1.3模拟破坏目标端OGG应用Dump文件,如何处理

  1. 1)手工修改dump文件
  2. [ogg@t2 ~]$ vi /u01/ogg/base/dirdat/t1000000052
  3. 破坏文件
  4.  
  5. 2)源端插入1条测试数据
  6. SQL> insert into dd values(4,'cc',sysdate);
  7. SQL> commit;
  8.  
  9. 3OGG 复制进程Abend
  10. 2020-11-10 11:36:59 ERROR OGG-02171 Error reading LCR from data source. Status 509, data source type TrailDataSource.
  11. 2020-11-10 11:36:59 ERROR OGG-02191 Incompatible record 101 in /u01/ogg/base/dirdat/t1000000052, rba 2,378 when getting trail header.
  12. 2020-11-10 11:36:59 ERROR OGG-01668 PROCESS ABENDING.
  13.  
  14. 4)源端再次插入1条测试数据
  15. SQL> insert into dd values(5,'cc',sysdate);
  16. 1 row created.
  17. SQL> commit;
  18. GGSCI (t1) 38> info dpea
  19. EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING
  20. Checkpoint Lag 00:00:00 (updated 00:00:03 ago)
  21. Process ID 11117
  22. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068
  23. 2020-11-10 13:25:29.000000 RBA 2604
  24. 此时,对于源端投递进程来说,eaxxx68 这个队列文件中,存在两条Insert记录;
  25. 对于目标端应用进程来说,repa t1xxx52队列文件中,应用第一条记录就报错了!


  26. 投递进程重新投递eaxxx68队列文件,这个文件被我们手工人为破坏了,【实际生产运维过程中,存在网络波动包损坏等,导致源端投递进程无法写入文件,导致OGG同步链路中断】,
    原本是想模拟这个场景,但是本次模拟投递正常,应用失败。
  27. GGSCI (t1) 40> info dpea
  28. EXTRACT DPEA Last Started 2020-11-10 11:30 Status RUNNING
  29. Checkpoint Lag 00:00:00 (updated 00:00:03 ago)
  30. Process ID 11117
  31. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068
  32. 2020-11-10 13:25:29.000000 RBA 2604
  33.  
  34. GGSCI (t1) 47> view param dpea
  35. extract dpea
  36. rmthost 10.0.0.32,mgrport 7809, compress
  37. rmttrail /u01/ogg/base/dirdat/t1
  38. table YZ.DD;

  39. 5 如何处理??? 既然是dump文件损坏,源端投递进程重新再次投递一个这个seqno文件不就可行? 使用etrollover前滚投递进程!
  1. GGSCI (t1) 55> alter EXTRACT dpea etrollover
  2. 2020-11-10 13:39:25 INFO OGG-01520 Rollover performed. For each affected output trail of Version 10 or higher format,
    after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's
    scan to the new trail file; it will not happen automatically.
  3. EXTRACT altered.
  4.  
  5. GGSCI (t1) 48> info dpea,detail
  6. EXTRACT DPEA Initialized 2020-11-10 11:30 Status STOPPED
  7. Checkpoint Lag 00:00:00 (updated 00:01:07 ago)
  8. Log Read Checkpoint File /u01/ogg/base/dirdat/ea000000068
  9. 2020-11-10 13:25:29.000000 RBA 2604
  10. Target Extract Trails:
  11. Trail Name Seqno RBA Max MB Trail Type
  12. /u01/ogg/base/dirdat/t1 53 0 20 RMTTRAIL
  13. Extract Source Begin End
  14. /u01/ogg/base/dirdat/ea000000068 * Initialized * 2020-11-10 13:25
  15. /u01/ogg/base/dirdat/ea000000068 2020-11-10 11:05 2020-11-10 13:25
  16. /u01/ogg/base/dirdat/ea000000067 2020-10-13 13:24 2020-11-10 11:05
  17. /u01/ogg/base/dirdat/ea000000066 2020-10-13 13:24 2020-10-13 13:24
  18. [ogg@t2 ~]$ ls -lrt /u01/ogg/base/dirdat/t1*
  19. GGSCI (t1) 49> start dpea
  20. 可以发现什么问题? OGG extract source 里面存着2eaxxx68 seqno文件,正常情况下只会出现1条,并且 end time一致,因此相当于这个seq文件重新投递。

  21. 6)目标端再次启动复制进程
  22. GGSCI (t2) 52> info repa
  23.  
  24. REPLICAT REPA Last Started 2020-11-10 11:28 Status ABENDED
  25. Checkpoint Lag 00:00:00 (updated 01:58:17 ago)
  26. Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
  27. 2020-11-10 11:34:51.656002 RBA 2378

GGSCI (t2) 58> start repa
GGSCI (t2) 59> info repa
REPLICAT REPA Last Started 2020-11-10 13:35 Status RUNNING
Checkpoint Lag 00:00:00 (updated 00:00:10 ago)
Process ID 12727
Log Read Checkpoint File /u01/ogg/base/dirdat/t1000000052
2020-11-10 13:25:28.699520 RBA 2698
SQL> select * from dd;
ID CC_NAME WITTIME
---------- ------------------------------ ------------------------------
3 cc 10-NOV-20 11.34.50.000000 AM
2 2 03-JUN-20 02.34.37.000000 PM
4 cc 10-NOV-20 11.37.19.000000 AM
5 cc 10-NOV-20 01.25.27.000000 PM