现象:
今天某客户的数据库出现大量的ORA-4031,信息如下所示,此时业务受到影响。
Wed Sep 02 10:32:45 2020
Errors in file /u01/app/oracle/diag/rdbms/dbm01/dbm011/trace/dbm011_ora_377748.trc (incident=576498):
ORA-04031: 无法分配 4160 字节的共享内存 ("shared pool","unknown object","sga heap(1,0)","modification ")
问题分析:
(1)、查看对应的trace文件:
=====================================
Allocation Request Summary Informaton
=====================================
Current information setting: 04014fff
SGA Heap Dump Interval=3600 seconds
Dump Interval=300 seconds
Last Dump Time=09/02/2020 08:09:53
Dump Count=1
Allocation request for: modification
Heap: 0x6005dc08, size: 4160
******************************************************
HEAP DUMP heap name="sga heap(1,0)" desc=0x6005dc08
extent sz=0xfe0 alt=248 het=32767 rec=9 flg=-126 opc=0
parent=(nil) owner=(nil) nex=(nil) xsz=0x10000000 heap=(nil)
fl2=0x20, nex=(nil), dsxvers=1, dsxflg=0x0
dsx first ext=0xb0000000
latch set 1 of 7
durations enabled for this heap
reserved granules for root 0 (granule size 268435456)
====================
=========================
User Session State Object
=========================
----------------------------------------
SO: 0x1e5137d980, type: 4, owner: 0x1d80fd2720, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1d80fd2720, name=session, file=ksu.h LINE:12729, pg=0
(session) sid: 1454 ser: 17473 trans: 0x1dc9544a28, creator: 0x1d80fd2720
flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-/INC
DID: , short-term DID:
txn branch: (nil)
edition#: 100 oct: 2, prv: 0, sql: 0x1e2d71d120, psql: 0x1dff090a40, user: 112/IWMS
ksuxds FALSE at location: 0
service name: dbm01
client details:
O/S info: user: root, term: unknown, ospid: 1234
machine: ys program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
Current Wait Stack:
Not in wait; last wait ended 0.000656 sec ago
Wait State:
fixed_waits=0 flags=0x21 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000680 sec since last wait
0: waited for 'SGA: allocation forcing component growth'
1: waited for 'SGA: allocation forcing component growth'
2: waited for 'SGA: allocation forcing component growth'
3: waited for 'SGA: allocation forcing component growth'
4: waited for 'SGA: allocation forcing component growth'
5: waited for 'SGA: allocation forcing component growth'
6: waited for 'SGA: allocation forcing component growth'
从4031生成的trace日志文件来看,在发生4031错误之前,很长的时间都在等待SGA: allocation forcing component growth,这说明SGA中的buffer cache和shared pool出现争用。 shared pool不够,所以报ORA-4031。
(2)、查看shared pool的设置:
从AWR报告可以看出,当前这个实例的SGA_TARGET设置为120GB左右,同时指定了shared pool至少30GB。 理论上,30GB已经不少了。
(3)、继续查看ORA-4031对应的trace日志文件,对每个pool 都进行了统计:
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 1
----------------------------------------------
"SQLA " 1668 MB 34%
"free memory " 879 MB 18%
"KGLH0 " 614 MB 13%
"kglsim object batch " 415 MB 9%
"gcs resources " 328 MB 7%
"kglsim heap " 244 MB 5%
"gcs shadows " 228 MB 5%
"KGLHD " 108 MB 2%
"db_block_hash_buckets " 101 MB 2%
"FileOpenBlock " 62 MB 1%
-----------------------------------------
free memory 879 MB
memory alloc. 3985 MB
Sub total 4864 MB
==============================================
TOP 10 MEMORY USES FOR SGA HEAP SUB POOL 2
----------------------------------------------
"SQLA " 1732 MB 40%
"free memory " 796 MB 18%
"KGLH0 " 625 MB 14%
"gcs resources " 329 MB 8%
"gcs shadows " 228 MB 5%
"kglsim object batch " 138 MB 3%
"KGLHD " 116 MB 3%
"db_block_hash_buckets " 102 MB 2%
"kglsim heap " 81 MB 2%
"dbktb: trace buffer " 44 MB 1%
-----------------------------------------
free memory 796 MB
memory alloc. 3556 MB
Sub total 4352 MB
(。。。。。。略。。。。。。)
检查可以发现,在每个subpool中,free memory还是挺大的。
(4)、为了尽快恢复业务,于是手动刷shared pool:
alter system flush shared_pool;
重新启动业务,但发现仍然报ORA-4031,这说明内存不足的这个shared pool区域,即使执行flush操作,内存仍然是不会释放的。
(5)、重新查看数据库的alert日志,对ORA-4031错误进行统计分析,发现全部是sga heap(1,0)不足,这说明是1号subpool中的0号duration中内存用光了,而其他的duration中仍然有free memory。在shared pool中,每个subpool中都有4个durations,分别为: "instance", "session", "cursor" 和 "execution". 由于算法的缺陷,当某个duration中的内存用光,就认为这个subpool用光了。这一行为是由隐含参数_enable_shared_pool_durations进行控制的,默认为TRUE,如果设置成FALSE,则表示这4个durations共享这个subpool中的所有剩余内存。
总结:
建议客户设置"_enable_shared_pool_durations"隐含参数为false,或者关闭ASMM,手动设置 buffer cache 和 shared pool大小。