OS版本:Red Hat Enterprise Linux Server release 7.3 (Maipo)
数据库版本:11.2.0.4.0
架构:RAC+单机DG
故障现象:
节点1 主机down
节点2 报错信息如下:
later 日志:
Mon Sep 16 02:00:00 2019
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Mon Sep 16 02:20:59 2019
Reconfiguration started (old inc 4, new inc 6)
List of instances:
2 (myinst: 2)
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Sep 16 02:20:59 2019
Mon Sep 16 02:20:59 2019
LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Mon Sep 16 02:20:59 2019
LMS 3: 1 GCS shadows cancelled, 1 closed, 0 Xw survived
Mon Sep 16 02:20:59 2019
Mon Sep 16 02:20:59 2019
LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 5: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Mon Sep 16 02:20:59 2019
LMS 4: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Sep 16 02:20:59 2019
minact-scn: master found reconf/inst-rec before recscn scan old-inc#:4 new-inc#:4
Post SMON to start 1st pass IR
Mon Sep 16 02:20:59 2019
Instance recovery: looking for dead threads
Beginning instance recovery of 1 threads
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
parallel recovery started with 32 processes
Started redo scan
Completed redo scan
read 579 KB redo, 187 data blocks need recovery
Mon Sep 16 02:21:03 2019
Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p000_92487.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 202351108
Mon Sep 16 02:21:03 2019
Mon Sep 16 02:21:03 2019
Mon Sep 16 02:21:03 2019
Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p025_92901.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 202351108
Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p027_92905.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 202351108
Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p022_92893.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 202351108
Mon Sep 16 02:21:03 2019
Errors in file /u01/app/oracle/diag/rdbms/test/test2/trace/test2_p026_92903.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 11: Resource temporarily unavailable
Additional information: 3
Additional information: 128
Additional information: 202351108
--grid 日志
2019-05-08 13:42:25.391:
[crsd(7600)]CRS-2769:Unable to failover resource 'ora.test.db'.
2019-09-16 02:20:43.924:
[cssd(5487)]CRS-1612:Network communication with node test1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.870 seconds
2019-09-16 02:20:51.925:
[cssd(5487)]CRS-1611:Network communication with node test1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.870 seconds
2019-09-16 02:20:55.925:
[cssd(5487)]CRS-1610:Network communication with node test1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.870 seconds
2019-09-16 02:20:58.797:
[cssd(5487)]CRS-1632:Node test1 is being removed from the cluster in cluster incarnation 452719139
2019-09-16 02:20:58.801:
[cssd(5487)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test2 .
2019-09-16 02:20:58.802:
[crsd(7600)]CRS-5504:Node down event reported for node 'test1'.
2019-09-16 02:21:01.997:
[crsd(7600)]CRS-2773:Server 'test1' has been removed from pool 'Generic'.
2019-09-16 02:21:01.997:
[crsd(7600)]CRS-2773:Server 'test1' has been removed from pool 'ora.test'.
2019-09-16 03:28:29.268:
[cssd(5487)]CRS-1601:CSSD Reconfiguration complete. Active nodes are test1 test2 .
2019-09-16 03:29:14.720:
[crsd(7600)]CRS-2772:Server 'test1' has been assigned to pool 'Generic'.
2019-09-16 03:29:14.720:
[crsd(7600)]CRS-2772:Server 'test1' has been assigned to pool 'ora.test'.
[grid@test2 test2]$
---主机日志
Sep 16 02:21:01 test2 systemd: Started Session 1138864 of user root.
Sep 16 02:21:01 test2 systemd: Starting Session 1138864 of user root.
Sep 16 02:21:01 test2 systemd: Started Session 1138865 of user oracle.
Sep 16 02:21:01 test2 systemd: Starting Session 1138865 of user oracle.
Sep 16 02:21:01 test2 systemd: Started Session 1138866 of user oracle.
Sep 16 02:21:01 test2 systemd: Starting Session 1138866 of user oracle.
Sep 16 02:21:01 test2 systemd: Started Session 1138867 of user oracle.
Sep 16 02:21:01 test2 systemd: Starting Session 1138867 of user oracle.
Sep 16 02:21:01 test2 systemd: Started Session 1138868 of user oracle.
Sep 16 02:21:01 test2 systemd: Starting Session 1138868 of user oracle.
Sep 16 02:21:01 test2 su: (to oracle) root on none
Sep 16 02:21:01 test2 su: (to oracle) root on none
Sep 16 02:21:01 test2 systemd: Removed slice user-0.slice.
Sep 16 02:21:01 test2 systemd: Stopping user-0.slice.
Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.
Sep 16 02:21:01 test2 avahi-daemon[3186]: Withdrawing address record for 1.3.10.8 on bond0.
Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.
Sep 16 02:21:01 test2 avahi-daemon[3186]: Withdrawing address record for 1.3.10.8 on bond0.
Sep 16 02:21:01 test2 avahi-daemon[3186]: Registering new address record for 1.3.10.8 on bond0.IPv4.
Sep 16 02:22:01 test2 systemd: Created slice user-0.slice.
Sep 16 02:22:01 test2 systemd: Starting user-0.slice.
Sep 16 02:22:01 test2 systemd: Started Session 1138869 of user root.
--节点1 日志:
--alter
Mon Sep 16 02:00:00 2019
Closing scheduler window
Closing Resource Manager plan via scheduler window
Clearing Resource Manager plan via parameter
Mon Sep 16 03:29:22 2019
Starting ORACLE instance (normal)
************************ Large Pages Information *******************
Per process system memlock (soft) limit = UNLIMITED
--ASM ALTER---早就出现ORA-27090
Tue May 21 17:21:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_208122.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Additional information: 128
Tue May 21 22:15:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_259760.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3
Additional information: 128
Additional information: 1
Tue May 21 22:21:02 2019
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_269758.trc:
ORA-27090: Unable to reserve kernel resources for asynchronous disk I/O
Linux-x86_64 Error: 2: No such file or directory
---mos 解决办法:
--两个节点都修改
vi /etc/sysctl.conf
fs.aio-max-nr = 3145728
sysctl -p
---验证是否生效
--GRID 用户执行
cluvfy comp sys -n all -p crs -verbose
--输出结果如下
Check: Kernel parameter for "aio-max-nr"
Node Name Current Configured Required Status Comment
---------------- ------------ ------------ ------------ ------------ ------------
test2 3145728 3145728 1048576 passed
test1 3145728 3145728 1048576 passed
Result: Kernel parameter check passed for "aio-max-nr"
文档 ID 579108.1