Oracle ADG备库的2节点查询wrong result问题
应用组的向我反馈了一个很奇葩的问题。
在备库查询同样的SQL,结果和主库不一样。如下:
19:56:19 SYS@xxxxxx2(483)> select * from user.tablename t where t.seqse_no in ('22020002401'); no rows selected Elapsed: 00:00:00.01 19:57:18 SYS@xxxxxx2(483)> select count(*) from user.tablename t where t.seqse_no='22020002401'; COUNT(*) ---------- 8 Elapsed: 00:00:00.01 19:57:38 SYS@xxxxxx2(483)> select * from user.tablename t where t.seqse_no = '22020002401'; no rows selected Elapsed: 00:00:00.01 19:58:43 SYS@xxxxxx2(483)> set pagesize 9999 long 9999 line 500 timing on 19:58:50 SYS@xxxxxx2(483)> alter session set statistics_level=all; Session altered. Elapsed: 00:00:00.00 19:58:50 SYS@xxxxxx2(483)> select count(*) from user.tablename t where t.seqse_no='22020002401'; COUNT(*) ---------- 8 Elapsed: 00:00:00.01 19:58:52 SYS@xxxxxx2(483)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- SQL_ID 903nj19yx1ku0, child number 1 ------------------------------------- select count(*) from user.tablename t where t.seqse_no='22020002401' Plan hash value: 533856606 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 4 | | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 | |* 2 | INDEX RANGE SCAN| IDX_TDH_SEQSE_NO | 1 | 4 | 8 |00:00:00.01 | 4 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T"."SEQSE_NO"='22020002401') 20 rows selected. Elapsed: 00:00:00.08 19:58:53 SYS@xxxxxx2(483)> select * from user.tablename t where t.seqse_no = '22020002401'; no rows selected Elapsed: 00:00:00.01 19:59:01 SYS@xxxxxx2(483)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last')); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------------- SQL_ID dtj0crstr2tgb, child number 1 ------------------------------------- select * from user.tablename t where t.seqse_no ='22020002401' Plan hash value: 176717312 ----------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 4 | | 1 | TABLE ACCESS BY GLOBAL INDEX ROWID| TABLENAME | 1 | 4 | 0 |00:00:00.01 | 4 | |* 2 | INDEX RANGE SCAN | IDX_TDH_SEQSE_NO | 1 | 4 | 8 |00:00:00.01 | 4 | ----------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("T"."SEQSE_NO"='22020002401') 19 rows selected. Elapsed: 00:00:00.04
同时,警告日志可能伴有ORA-600的报错。
Mon Jan 03 11:43:12 2022 Errors in file /u01/app/oracle/diag/rdbms/xxxxxxstb/xxxxxx2/trace/xxxxxx2_ora_28759.trc (incident=25033): ORA-00600: 内部错误代码, 参数: [12406], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /u01/app/oracle/diag/rdbms/xxxxxxstb/xxxxxx2/incident/incdir_25033/xxxxxx2_ora_28759_i25033.trc Errors in file /u01/app/oracle/diag/rdbms/xxxxxxstb/xxxxxx2/incident/incdir_25033/xxxxxx2_ora_28759_i25033.trc:
xxxxxx2_ora_28759.trc部分内容:
Incident 25033 created, dump file: /u01/app/oracle/diag/rdbms/xxxxxxstb/xxxxxx2/incident/incdir_25033/xxxxxx2_ora_28759_i25033.trc ORA-00600: 内部错误代码, 参数: [12406], [], [], [], [], [], [], [], [], [], [], []
xxxxxx2_ora_28759_i25033.trc部分内容:
*** 2022-01-03 11:43:12.994 *** SESSION ID:(861.5305) 2022-01-03 11:43:12.994 *** CLIENT ID:() 2022-01-03 11:43:12.994 *** SERVICE NAME:() 2022-01-03 11:43:12.994 *** MODULE NAME:(PL/SQL Developer) 2022-01-03 11:43:12.994 *** ACTION NAME:(SQL Window - 常用脚本.sql) 2022-01-03 11:43:12.994 Dump continued from file: /u01/app/oracle/diag/rdbms/xxxxxxstb/xxxxxx2/trace/xxxxxx2_ora_28759.trc ORA-00600: 内部错误代码, 参数: [12406], [], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 25033 (ORA 600 [12406]) ======== *** 2022-01-03 11:43:12.995 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=du0p5v7647sbs) -----
这种一般都是BUG导致的问题,而且这类查询结果有问题的叫“wrong result”,查询MOS,根据Known issues of Wrong Results in Active Data Guard(ADG) (文档 ID 2490469.1),
unpublished Bug 19614585 - Wrong Results / ORA-600 [kksgaGetNoAlloc_Int0] / ORA-600 [12406] / ORA-7445 / ORA-8103 / ORA-1555 from query on RAC ADG Physical Standby Database (Note:19614585.8)比较符合:
Affects versions: Versions BELOW 12.2
Versions confirmed as being affected12.1.0.2 (Server Patch Set)
11.2.0.4Symptoms:
This bug is only relevant when using Real Application Clusters (RAC) and Partitioned TablesThis problem is caused by the fix of bug 15969429.
If the fix of bug 15969429 is present (which is the case of 11.2.0.4.0 and 12.1.0.1) and if running a query against
a PARTITION object on a RAC Active Dataguard (ADG) Physical Standby database causes Wrong Results or errors like:ORA-600 [kksgaGetNoAlloc_Int0]
ORA-600 [qesmaGetIdxSeg2]
ORA-600 [kdsgrp1]
ORA-600 [12406]
ORA-7445 [kqrLocalInvalidateByHash]
ORA-7445 [kslgetl]
ORA-8103
ORA-1555and the same query runs without problems on the primary database, then it could be a case of this bug.
As an example of wrong results, for a RAC ADG database with two instances, the ADG instance 1 may see a different number of rows for the PARTITION table than ADG instance 2.
fixed:
12.2.0.1
12.1.0.2 Bundle Patch 3 for Engineered Systems/DB in-Memory(DBBP)
11.2.0.4 Bundle Patch 17 for Exadata Database(Jul 2015)
11.2.0.4.160229 (Feb 2016) Bundle Patch for Windows PlatformsSolution: Check for availability of One off patch using Patch 19614585.
ORA-600[12406]以及备库1查询和2查询是不一样的,本次有问题的SQL涉及的表也是一个分区表。
下载补丁p19614585_11204200114_Linux-x86-64.zip后在测试库上做冲突检查后发现有冲突补丁存在没法打......
[oracle@dev-testdb 19614585]$ opatch lspatches 18034737; 13446078; 14202396; 31668908;OJVM PATCH SET UPDATE 11.2.0.4.201020 31537677;Database Patch Set Update : 11.2.0.4.201020 (31537677) 29938455;OCW Patch Set Update : 11.2.0.4.191015 (29938455) OPatch succeeded. [oracle@dev-testdb 19614585]$ opatch prereq CheckConflictAgainstOHWithDetail -ph ./ Oracle Interim Patch Installer version 11.2.0.3.28 Copyright (c) 2022, Oracle Corporation. All rights reserved. PREREQ session Oracle Home : /u01/app/oracle/product/11.2.0/db_1 Central Inventory : /u01/app/oraInventory from : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc OPatch version : 11.2.0.3.28 OUI version : 11.2.0.4.0 Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2022-01-11_17-30-44PM_1.log Invoking prereq "checkconflictagainstohwithdetail" ZOP-40: The patch(es) has conflicts with other patches installed in the Oracle Home (or) among themselves. Prereq "checkConflictAgainstOHWithDetail" failed. Summary of Conflict Analysis: There are no patches that can be applied now. Following patches have conflicts. Please contact Oracle Support and get the merged patch of the patches : 19614585, 31537677 Whole composite patch Conflicts/Supersets are: Patch : 19614585 Conflict with Composite Patch 31537677 Detail Conflicts/Supersets for each patch are: Patch : 19614585 Conflict with Sub-Patch 30670774 Conflict details: /u01/app/oracle/product/11.2.0/db_1/lib/libgeneric11.a:kgl.o OPatch succeeded.
由于目前备库作用除了容灾外,也是给内部人员做只读操作的分流分压的。
像是PLSQL连接是长时间保持连着的,全部加起来可以达到100多个,连到备库可以分连接数。
另外有相当一部分的统计需要全表扫描表,连到备库可以分摊IO和CPU的压力。
无法应用补丁,最后决定将2节点实例关闭在观察。