一、疑问,什么情况下主库会发出一个会话连接备库
可以根据上述问题,进行一定猜测。
例如,主库启动时,远程归档线程自动发出远程连接dg备库的请求;
以及远程归档线程状态参数,defer,enable切换时会尝试发起新的连接;
二、测试
2.1 主备连接正常,主库shutdown immediate
Primary SQL> shutdown immediate; Standby ps -ef|grep LOCAL=NO 之前有来自主库远程发起的5个session,均断开 dg_alert 2021-04-20T07:53:39.624556+08:00 RFS[13]: Assigned to RFS process (PID:51957) RFS[13]: Possible network disconnect with primary database 其中51957 是其中一个主库远程发起session 在备库主机上的pid
2.2 主备连接正常,主库startup
Primary
SQL> startup
DB_alert
2021-04-20T08:00:48.636757+08:00
TT02: Standby redo logfile selected for thread 1 sequence 98 for destination LOG_ARCHIVE_DEST_2
2021-04-20T08:00:48.761592+08:00
ARC1: Standby redo logfile selected for thread 1 sequence 97 for destination LOG_ARCHIVE_DEST_2
Standby
$ ps -efmT|grep LOCAL=NO
oracle 54578
oracle 54580
oracle 54582
DG_ALERT
2021-04-20T08:00:48.599128+08:00
Primary database is in MAXIMUM PERFORMANCE mode
RFS[14]: Assigned to RFS process (PID:54578)
RFS[14]: Selected log 4 for T-1.S-98 dbid 1594123398 branch 1066342665
2021-04-20T08:00:48.745723+08:00
RFS[15]: Assigned to RFS process (PID:54580)
RFS[15]: Selected log 5 for T-1.S-97 dbid 1594123398 branch 1066342665
2021-04-20T08:00:48.851883+08:00
Recovery: global flush, serial recovery
主库启动时,远程归档线程会连接备库,备库RFS进程接收归档信息进行备库本地化存储归档日志。
2.3 备库kill远程会话,关闭dg监听
备库 $ ps -ef|grep LOCAL=NO oracle 3097 1 0 05:07 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 3099 1 0 05:07 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 3101 1 0 05:07 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 3103 1 0 05:07 ? 00:00:00 oraclec12 (LOCAL=NO) SQL> select s.sid,s.serial#,p.program,s.username,p.username,p.background,s.program,s.LAST_CALL_ET,s.LOGON_TIME,s.status from v$process p,v$session s where p.addr=s.paddr and p.spid in(3097,3099,3101,3103); SID SERIAL# PROGRAM USERNAME USERNAME BA PROGRAM LAST_CALL_ET LOGON_TIME STATUS ---------- ---------- ------------------------- ---------- ---------- -- ------------------------- ------------ ------------ ---------------- 51 18042 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 9 09-MAR-21 INACTIVE 49 51581 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 853 09-MAR-21 INACTIVE 50 45678 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 849 09-MAR-21 INACTIVE 47 5930 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 0 09-MAR-21 INACTIVE 主库 SQL> shutdown immediate; 备库 远程session 都断开连接! 备库$lsnrctl stop 主库 SQL> startup select DEST_NAME , STATUS , RECOVERY_MODE from v$archive_dest_status where dest_name='LOG_ARCHIVE_DEST_2'; DEST_NAME STATUS RECOVERY_MODE -------------------- ------------------ --------------- LOG_ARCHIVE_DEST_2 ERROR IDLE 备库 SQL> host lsnrctl start $ ps -ef|grep -v grep|grep LOCAL=NO|wc -l 0 alter system set log_archive_dest_state_2=defer; alter system set log_archive_dest_state_2=enable; 备库listener TIMESTAMP * CONNECT DATA [* PROTOCOL INFO] * EVENT [* SID] * RETURN CODE WARNING: Subscription for node down event still pending 09-MAR-2021 05:28:48 * (CONNECT_DATA=(CID=(PROGRAM=)(HOST=c12)(USER=oracle))(COMMAND=status)(ARGUMENTS=64)(SERVICE=LISTENER)(VERSION=203424000)) * status * 0 2021-03-09T05:31:30.969614+08:00 09-MAR-2021 05:31:30 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle@c11)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54711)) * establish * orcldg * 0 09-MAR-2021 05:31:30 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle@c11)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54712)) * establish * orcldg * 0 2021-03-09T05:31:32.030043+08:00 09-MAR-2021 05:31:32 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54713)) * establish * orcldg * 0 09-MAR-2021 05:31:32 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54714)) * establish * orcldg * 0 2021-03-09T05:31:33.086332+08:00 09-MAR-2021 05:31:33 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54715)) * establish * orcldg * 0 09-MAR-2021 05:31:33 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54716)) * establish * orcldg * 0 09-MAR-2021 05:31:33 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle@c11)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54717)) * establish * orcldg * 0 09-MAR-2021 05:31:33 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54719)) * establish * orcldg * 0 09-MAR-2021 05:31:33 * (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=orcldg)(CID=(PROGRAM=oracle@c11)(HOST=c11)(USER=oracle))) * (ADDRESS=(PROTOCOL=tcp)(HOST=10.0.0.98)(PORT=54718)) * establish * orcldg * 0 $ ps -ef|grep LOCAL=NO|grep -v grep oracle 4874 1 0 05:31 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 4876 1 0 05:31 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 4878 1 0 05:31 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 4880 1 0 05:31 ? 00:00:00 oraclec12 (LOCAL=NO) oracle 4882 1 0 05:31 ? 00:00:00 oraclec12 (LOCAL=NO) 也就是说,12C 主备之间存在多个session select s.sid,s.serial#,p.program,s.username,p.username,p.background,s.program,s.LAST_CALL_ET,s.LOGON_TIME,s.status from v$process p,v$session s where p.addr=s.paddr and p.spid in(4874,4876,4878,4880,4882); SID SERIAL# PROGRAM USERNAME USERNAME BA PROGRAM LAST_CALL_ET LOGON_TIME STATUS ---------- ---------- ------------------------- ---------- ---------- -- ------------------------- ------------ ------------ ---------------- 27 44814 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 55 09-MAR-21 INACTIVE 56 62720 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 1 09-MAR-21 INACTIVE 55 38475 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 115 09-MAR-21 INACTIVE 28 3511 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 115 09-MAR-21 INACTIVE 51 40708 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 115 09-MAR-21 INACTIVE
可以发现备库关闭监听,也就代表拒绝新的连接。
而手工在dg备库kill远程会话,就是将已有的连接kill,因此主备之间连通性断开!
然后重置远程归档线程参数的状态defer,enable本质是让主库重新发起一次连接dg备库的请求!
2.4 dg备库取消MRP进程
alter database recover managed standby database cancel;
会话还在,主备连通性无改变,只是dg同步中断。
2.5 dg kill 远程会话
SQL> alter system kill session '56,62720' immediate; 主库报Alert 2021-03-09T05:37:27.839588+08:00 TT02: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135) TT02: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned 2021-03-09T05:37:27.853485+08:00 Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_4865.trc: ORA-03135: 连接失去联系 Error 3135 for archive log file 2 to 'tns_standby' 2021-03-09T05:37:27.855995+08:00 Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_4865.trc: ORA-03135: 连接失去联系 2021-03-09T05:37:27.856095+08:00 Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_4865.trc: ORA-03135: 连接失去联系 alter system set log_archive_dest_state_2=defer; alter system set log_archive_dest_state_2=enable; SQL> select s.sid,s.serial#,p.program,s.username,p.username,p.background,s.program,s.LAST_CALL_ET,s.LOGON_TIME,s.status from v$process p,v$session s where p.addr=s.paddr and s.PROGRAM='oracle@c11 (TNS V1-V3)'; SID SERIAL# PROGRAM USERNAME USERNAME BA PROGRAM LAST_CALL_ET LOGON_TIME STATUS ---------- ---------- ------------------------- ---------- ---------- -- ------------------------- ------------ ------------ ---------------- 60 28802 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 34 09-MAR-21 INACTIVE 56 17436 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 1 09-MAR-21 INACTIVE 55 38475 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 481 09-MAR-21 INACTIVE ALTER DATABASE RECOVER MANAGED STANDBY DATABASE archivelog DISCONNECT FROM SESSION; alter database recover managed standby database cancel; recover managed standby database disconnect from session; SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING ARCHIVED LOGFILE DISCONNECT; MRP0 WAIT_FOR_LOG SID SERIAL# PROGRAM USERNAME USERNAME BA PROGRAM LAST_CALL_ET LOGON_TIME STATUS ---------- ---------- ------------------------- ---------- ---------- -- ------------------------- ------------ ------------ ---------------- 55 57549 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 13 09-MAR-21 INACTIVE 56 17436 oracle@c12 PUBLIC oracle oracle@c11 (TNS V1-V3) 1 09-MAR-21 INACTIVE
上述问题,是我们重置参数修复的,新的问题又来了,备库kill 远程会话,Oracle能自动解决这个问题吗? 会话断开意味着主备连通性不正常,无法传输主库redo!
1) dg kill session ,primary db alert
2021-04-20T08:21:50.887515+08:00
TT02: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)
TT02: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
2021-04-20T08:21:50.887683+08:00
Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_10879.trc:
ORA-03135: 连接失去联系
Error 3135 for archive log file 2 to 'tns_standby'
2021-04-20T08:21:50.891152+08:00
Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_10879.trc:
ORA-03135: 连接失去联系
2021-04-20T08:21:50.892138+08:00
Errors in file /picclife/app/oracle/diag/rdbms/orcl/c11/trace/c11_tt02_10879.trc:
ORA-03135: 连接失去联系
2) 查询dg连通性状态
SQL> select dest_id,error from v$archive_dest;
DEST_ID ERROR
---------- -----------------------------------------------------------------
2 ORA-03135: connection lost contact
3)主库dml测试!
SQL> insert into b values(1);
1 row created.
SQL> commit;
dg 无法查询数据!
4)primary switch archive log
SQL> alter system switch logfile;
SQL> alter system archive log current;
主库多次切换归档,但是由于远程归档报错,并未重新发起连接dg的请求!
SQL> select max(sequence#),thread# from v$archived_log group by thread#;
MAX(SEQUENCE#) THREAD#
-------------- ----------
102 1
并且dg配置了fal server 追gap的情况未出现,因为陷入了一个思维陷阱!
dg sequence 最大 97, gap查询不出数据!
MAX(SEQUENCE#) THREAD#
-------------- ----------
97 1
5) wait for 6 minute, 主库再次尝试发出请求,dg 会话,连通性自动恢复正常,请注意关注时间点顺序!
Primary
ORA-03135: 连接失去联系
2021-04-20T08:21:50.892138+08:00
2021-04-20T08:26:20.347305+08:00
ALTER SYSTEM ARCHIVE LOG
2021-04-20T08:26:20.349898+08:00
Thread 1 cannot allocate new log, sequence 101
Checkpoint not complete
Current log# 1 seq# 100 mem# 0: /picclife/app/oracle/oradata/orcl/redo01.log
2021-04-20T08:26:22.826568+08:00
Thread 1 advanced to log sequence 101 (LGWR switch)
Current log# 2 seq# 101 mem# 0: /picclife/app/oracle/oradata/orcl/redo02.log
2021-04-20T08:26:22.838124+08:00
Archived Log entry 192 added for T-1.S-100 ID 0x5f054486 LAD:1
2021-04-20T08:26:22.843289+08:00
ALTER SYSTEM ARCHIVE LOG
2021-04-20T08:26:22.848645+08:00
Thread 1 advanced to log sequence 102 (LGWR switch)
Current log# 3 seq# 102 mem# 0: /picclife/app/oracle/oradata/orcl/redo03.log
2021-04-20T08:26:22.851459+08:00
Archived Log entry 193 added for T-1.S-101 ID 0x5f054486 LAD:1
2021-04-20T08:26:23.299206+08:00
ALTER SYSTEM ARCHIVE LOG
2021-04-20T08:26:23.301270+08:00
Thread 1 cannot allocate new log, sequence 103
Checkpoint not complete
Current log# 3 seq# 102 mem# 0: /picclife/app/oracle/oradata/orcl/redo03.log
2021-04-20T08:26:25.863465+08:00
Thread 1 advanced to log sequence 103 (LGWR switch)
Current log# 1 seq# 103 mem# 0: /picclife/app/oracle/oradata/orcl/redo01.log
2021-04-20T08:26:25.866709+08:00
Archived Log entry 194 added for T-1.S-102 ID 0x5f054486 LAD:1
2021-04-20T08:27:49.298684+08:00
TT04: Standby redo logfile selected for thread 1 sequence 103 for destination LOG_ARCHIVE_DEST_2
2021-04-20T08:28:49.448793+08:00
ARC3: Standby redo logfile selected for thread 1 sequence 98 for destination LOG_ARCHIVE_DEST_2
Standby
2021-04-20T08:27:49.287742+08:00
Primary database is in MAXIMUM PERFORMANCE mode
RFS[16]: Assigned to RFS process (PID:56268)
RFS[16]: Selected log 5 for T-1.S-103 dbid 1594123398 branch 1066342665
2021-04-20T08:28:49.348064+08:00
RFS[17]: Assigned to RFS process (PID:56328)
RFS[17]: Opened log for T-1.S-100 dbid 1594123398 branch 1066342665
2021-04-20T08:28:49.353374+08:00
RFS[18]: Assigned to RFS process (PID:56330)
RFS[18]: Opened log for T-1.S-99 dbid 1594123398 branch 1066342665
2021-04-20T08:28:49.361200+08:00
Archived Log entry 17 added for thread 1 sequence 99 rlc 1066342665 ID 0x5f054486 LAD2 :
2021-04-20T08:28:49.391236+08:00
Archived Log entry 18 added for thread 1 sequence 100 rlc 1066342665 ID 0x5f054486 LAD2 :
RFS[18]: Opened log for T-1.S-101 dbid 1594123398 branch 1066342665
RFS[17]: Opened log for T-1.S-102 dbid 1594123398 branch 1066342665
2021-04-20T08:28:49.403222+08:00
Archived Log entry 19 added for thread 1 sequence 101 rlc 1066342665 ID 0x5f054486 LAD2 :
2021-04-20T08:28:49.407801+08:00
Archived Log entry 20 added for thread 1 sequence 102 rlc 1066342665 ID 0x5f054486 LAD2 :
2021-04-20T08:28:49.434004+08:00
RFS[19]: Assigned to RFS process (PID:56332)
RFS[19]: Selected log 4 for T-1.S-98 dbid 1594123398 branch 1066342665
2021-04-20T08:28:49.462006+08:00