--节点1 告警日志
Thu Jun 20 03:02:18 2019
Thread 1 advanced to log sequence 2092 (LGWR switch)
Current log# 11 seq# 2092 mem# 0: +DATA/test/onlinelog/redo11_1.log
Current log# 11 seq# 2092 mem# 1: +DATA/test/onlinelog/redo11_2.log
Thu Jun 20 03:02:18 2019
LNS: Standby redo logfile selected for thread 1 sequence 2092 for destination LOG_ARCHIVE_DEST_2
Thu Jun 20 03:02:20 2019
Archived Log entry 5703 added for thread 1 sequence 2091 ID 0x9b8db1ed dest 1:
Thu Jun 20 03:03:53 2019
skgxpvfynet: mtype: 61 process 316751 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /u01/app/oracle/diag/rdbms/test/test1/trace/test1_ora_316751.trc (incident=560025):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /u01/app/oracle/diag/rdbms/test/test1/incident/incdir_560025/test1_ora_316751_i560025.trc
Thu Jun 20 03:03:54 2019
Dumping diagnostic data in directory=[cdmp_20190620030354], requested by (instance=1, osid=316751), summary=[incident=560025].
opiodr aborting process unknown ospid (316751) as a result of ORA-603
Thu Jun 20 03:04:03 2019
Sweep [inc][560025]: completed
Sweep [inc2][560025]: completed
Thu Jun 20 03:06:15 2019
Thread 1 advanced to log sequence 2093 (LGWR switch)
Current log# 5 seq# 2093 mem# 0: +DATA/test/onlinelog/redo05_1.log
Current log# 5 seq# 2093 mem# 1: +DATA/test/onlinelog/redo05_2.log
Thu Jun 20 03:06:16 2019
---trc 文件--trc文件,初步看应该是服务器 网卡mtu值相关问题
*** 2019-06-20 03:03:53.395
*** CLIENT ID:() 2019-06-20 03:03:53.395
*** SERVICE NAME:() 2019-06-20 03:03:53.395
*** MODULE NAME:() 2019-06-20 03:03:53.395
*** ACTION NAME:() 2019-06-20 03:03:53.395
SKGXP:[7f5ab0ef57c0.0]{0}: SKGXPVFYNET: Socket self-test could not verify successful transmission of 32768 bytes (mtype 61).
SKGXP:[7f5ab0ef57c0.1]{0}: The network is required to support UDP protocol sends of this size. Socket is bound to 169.254.16.188.
SKGXP:[7f5ab0ef57c0.2]{0}: phase 'send', 0 tries, 100 loops, 32354 ms (last)
struct ksxpp * ksxppg_ [0xc11fde0, 0x7f5aadaf5588) = 0x7f5aadaf5580
Dump of memory from 0x00007F5AADAF5580 to 0x00007F5AADAF6AB0
---数据库状态
select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;
--添加到CRT
set lines 1000
alter session set nls_date_format='yyyymmdd hh24:mi:ss';
set time on
select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;
09:42:29 SYS@test1(test1)> set lines 1000
09:43:30 SYS@test1(test1)> alter session set nls_date_format='yyyymmdd hh24:mi:ss';
Session altered.
09:43:31 SYS@test1(test1)> set time on
09:43:31 SYS@test1(test1)> select INST_ID,INSTANCE_NAME,STARTUP_TIME from gv$instance;
INST_ID INSTANCE_NAME STARTUP_TIME
---------- ---------------- -----------------
1 test1 20190509 15:30:08
2 test2 20190507 21:37:04
---节点2 日志
Thu Jun 20 03:03:54 2019
Dumping diagnostic data in directory=[cdmp_20190620030354], requested by (instance=1, osid=316751), summary=[incident=560025].
Thu Jun 20 03:10:10 2019
Thread 2 advanced to log sequence 912 (LGWR switch)
Current log# 6 seq# 912 mem# 0: +DATA/test/onlinelog/redo06_1.log
Current log# 6 seq# 912 mem# 1: +DATA/test/onlinelog/redo06_2.log
---参考官方文档
2041723.1
CAUSE
This happens due to less space available for network buffer reservation.
SOLUTION
1. On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.
*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***
On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.
On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here 'n' is the number of NUMA nodes.
2. Additionally, the MTU value should be modified as below
#ifconfig lo mtu 16436
To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :
MTU=16436
Save the file and restart the network service to load the changes
#service network restart
Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.
---野鸡博客
http://ju.outofmemory.cn/entry/76102
---实际操作,两个节点执行
[root@test1 bin]# ifconfig lo mtu 16436
[root@test1 bin]#
[root@test1 bin]# vi /etc/sysconfig/network-scripts/ifcfg-lo
DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0
# If you're having problems with gated making 127.0.0.0/8 a martian,
# you can change this to something else (255.255.255.255, for example)
BROADCAST=127.255.255.255
ONBOOT=yes
NAME=loopback
MTU=16436
---重启网络服务
# systemctl stop network
# systemctl start network
二、设定 vm.min_free_kbytes 参数为物理内存的0.4%
For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***
---此次修改主机内存 512G ,所以 这个值是 1073742*2=2147484
vm.min_free_kbytes=2147484
[oracle@test1 ~]$ cat /proc/sys/vm/min_free_kbytes
65536
[oracle@test1 ~]$
调整MIN_FREE_KBYTES的目的是保持物理内存有足够的空闲空间,防止突发性的换页。
vi /etc/sysctl.conf
vm.min_free_kbytes=2147484
--使生效
sysctl -p