4031_diag_script.zip
1. SGA中的内存池包含不同大小的内存块。当数据库启动时,就有一个大的内存块分配并被hush buckets 里的空闲列表追踪。随着时间推移,随着内存的分配和释放,内存块被按照大小在不同的hush buckets间移动。当SGA里任何一个内存池里出现不能满足内部分配请求的情况时,ORA-04031就出现了。
shared pool共享池的管理方式不同于其它的内存池。。共享池存放与数据字典和library cache有关的信息。但是,这些内存区域根据空闲列表和最近使用算法(LRU)管理。当在共享池的所有搜索结束后,从LRU列表清除所有的可能清除的对象, 多次扫描空闲列表后,仍没有找到内存块,ORA-04031就出现了。这意味着ORA-04031很难预测。
2. 对共享池的监测,可以看它是否包含许多类似的SQL,只有文字不同。 这种情况会占用更多的共享池内存并引共享池碎片,过多的共享池碎片(fragment)会导致虽然共享池中仍有大量的free memory,但都是尺寸较小的内存块(chunk),当Oracle进程申请一些较大的连续内存空间(memory chunk)时,虽然共享池中的free memory大小远大于申请的连续空间大小,仍会引发ORA-4031错误。使用绑定变量可以使SQL 共享。使用本文所附的脚本可以查出内存中是否有许多类似SQL。
即使使用了绑定变量后,仍然可能存在高version count(子指针)的情况。为了使子指针共享,CURSOR_SHARING参数可能需要调整。metalink 文档Note 296377.1 和 261020.1可以提供详细信息。若造成4031的原因是由于未绑定变量或者游标无法共享导致的过度硬解析(Hard Parse),则应当调整应用绑定变量或者调整初始化参数。
3. 9i中开始引入shared pool subpool子池技术。设计多个共享池子池的目的是分散单个shared pool LRU Latch的并行压力。 ,每一个子池都包含自有的LRU LIST和保留区域等其他内存结构。 subpools子池的数量会在Oracle实例启动(startup nomount)时根据SGA_MAX_SIZE(或11g中的memory_max_target)以及服务器上的CPU数量而自动决定。子池数量最多为7个,在Card系统中为4个。
每一个subpool都是一个"迷你型"的共享池,其包括自有的Freelist、内存结构记录和LRU LIST。subpool子池技术是对shared pool并发扩张能力的增强,且每一个子池现在都使用独立的shared pool child latch来保护。这意味着不再像9i以前的版本那样因为只有一个shared pool latch而剧烈争用了。
但是在实际使用中发现版本9i中仍存在一些BUG,可能导致在子池之间的内存使用分布不平衡均匀,这可能导致虽然部分子池使用率不高,但是个别子池内存过度分配,从而导致ORA-4031错误。一般把这种现象称作"subpool imbalance"。
This issue could occur if in the "Memory Utilization of Subpool" sections in your trace file, there is skewed distribution of allocations across subpools, i.e. allocation size for one area in a subpool is much larger than the same area in another subpool.
Starting in 9i, the SGA can be divided into subpools. Multiple Subpools are designed to relieve pressure on a single LRU latch in the Shared Pool. Each subpool will include its own LRU list, Reserved Area, etc. The number of subpools is computed by an algorithm using the SGA_MAX_SIZE (or memory_max_target in 11g), and the number of CPU's on the server. The maximum number of subpools is 7.
The shared pool and its associated reserved area, as well as the large pool, participate in subpooling. Each subpool is a "mini" shared pool, having its own set of Free Lists, memory structure entries, and LRU list. This was a scalability change made to the Shared Pool/Large Pool to increase the throughput of these pools in that now each subpool is protected by a Pool child latch. This means there is no longer contention in the Shared/Large Pool for a single latch as in earlier versions.
If the distribution of memory usage is not balanced somewhat equally among the subpools, we can get an ORA-4031 as one subpool is over allocated while others remain under allocated. This condition is known as a subpool imbalance.
By analyzing the uploaded file, we have found the following symptoms that may have caused the issue:
** In your trace file, there is evidence of subpool imbalance:
sql area in subpool 0 has 64536 bytes, while in subpool 4 it has 271924816 bytes, which is 4213 times larger.
Recommended Solution
1) Decrease the number of subpools in use, or increase SHARED_POOL_SIZE, depending on the current number of subpools.(To find the current number of subpools, see REFERENCE below)
If the current number of subpools is...
2................increase the parameter SHARED_POOL_SIZE by 15%
Note: If you decrease the subpools to one, you limit the Shared Pool to only one Least Recently Used (LRU) latch. One very large subpool maintains a long LRU list and memory operations in the Shared Pool can impact performance on the database. The level of impact is dependent on many factors. Testing of this change in your environment is the only reliable method to determine the cost of making this change in your database.
3................reduce the number of subpools to 2
4................reduce the number of subpools to 2
5................reduce the number of subpools to 3
6................reduce the number of subpools to 3
7................reduce the number of subpools to 3
Note: Decreasing the number of subpools (>1) may help alleviate the imbalance problem without affecting performance as heavily. Again testing of the change in your environment is the only reliable method to determine the performance impacts in your database.
You can change the number of subpools by:
SQL> alter system set "_kghdsidx_count"=(desired value of number of subpools) scope=spfile;
Note: _kghdsidx_count in the above mentioned SQL is a hidden parameter. However, changing this particular hidden parameter will not have side-effects within the database and can be modified as a workaround. This is not intended to be a long term fix.
2) Restart the database for the change to take effect.
3) If ORA-04031 errors persist, please review the following notes for known subpool imbalance issues.
NOTE:811974.1 - ORA-4031 In Partition Maintenance Job From Subpool Imbalance (Doc ID 811974.1)
NOTE:835176.1 - Ora-4031 Errors using Full Outer Joins
NOTE:6271590.8 - Bug 6271590 - SGA subheap imbalance with lots of free memory in a few subheaps
NOTE:4184298.8 - Bug 4184298 - Subpool imbalance for "session parameters" can lead to ORA-4031
o Instructions for capturing a heapdump from the next occurrence of an ORA-04031 error are shown below.
alter system set events '4031 trace name HEAPDUMP level 2';
o Turn this event off using
alter system set events '4031 trace name HEAPDUMP off';
4) If the latest patchset recommended by the above notes have been applied and ORA-04031 errors persist, please submit a new Service Request to have the issue investigated by Oracle Support.
NOTE: If you captured any heapdump please upload to the Service Request to expedite the resolution time on the Service Request.
REFERENCE: To check the current number of subpools, use the following query:
set pages 1000 lines 120
col name for a60
col value for a30
spool diagnosis1.lst
SELECT SUM(PINS) "EXECUTIONS", SUM(RELOADS) "CACHE MISSES WHILE EXECUTING" FROM V$LIBRARYCACHE;
SELECT substr(sql_text,1,90) "SQL",count(*) , sum(executions) "TotExecs" FROM v$sqlarea WHERE executions < 5
GROUP BY substr(sql_text,1,90) HAVING count(*) > 30 ORDER BY 2;
select * from v$sgastat where pool like 'shared%' order by bytes;
select sa.sql_text,sa.version_count ,ss.* from v$sqlarea sa,v$sql_shared_cursor ss where sa.address=ss.ADDRESS and
sa.version_count > 50 order by sa.version_count ;
select nam.ksppinm NAME, val.KSPPSTVL VALUE from x$ksppi nam, x$ksppsv val where nam.indx = val.indx and
nam.ksppinm like '%shared%' order by 1;
col free_space for 999,999,999,999 head "TOTAL FREE"
col avg_free_size for 999,999,999,999 head "AVERAGE|CHUNK SIZE"
col free_count for 999,999,999,999 head "COUNT"
col reqeust_misses for 999,999,999,999 head "REQUEST|MISSES"
col reqeust_failures for 999,999,999,999 head "REQUEST|FAILURES"
col max_free_size for 999,999,999,999 head "LARGEST CHUNK"
select free_space, avg_free_size, free_count, max_free_size, request_misses, request_failures,LAST_FAILURE_SIZE from v$shared_pool_reserved
col Parameter format a25
col "Session Value" format a15
col "Instance Value" format a15
select a.ksppinm "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value"
from sys.x$ksppi a, sys.x$ksppcv b, sys.x$ksppsv c
where a.indx = b.indx and a.indx = c.indx and a.ksppinm in
('__shared_pool_size','__db_cache_size','__large_pool_size','__java_pool_size','__streams_pool_size');
set pagesize 100
select * from v$sga_dynamic_components;
col requests for 999,999,999
col last_failure_size for 999,999,999 head "LAST FAILURE| SIZE "
col last_miss_size for 999,999,999 head "LAST MISS|SIZE "
col pct for 999 head "HIT|% "
col request_failures for 999,999,999,999 head "FAILURES"
select requests,
decode(requests,0,0,trunc(100-(100*(request_misses/requests)),0)) PCT, request_failures, last_miss_size, last_failure_size
from v$shared_pool_reserved;
select p.inst_id, p.free_space, p.avg_free_size, p.free_count,
p.max_free_size, p.used_space, p.avg_used_size, p.used_count, p.max_used_size,
s.requests, s.request_misses, s.last_miss_size, s.max_miss_size,
s.request_failures, s.last_failure_size, s.aborted_request_threshold,
s.aborted_requests, s.last_aborted_size
from (select avg(x$ksmspr.inst_id) inst_id,
sum(decode(ksmchcls,'R-free',ksmchsiz,0)) free_space,
avg(decode(ksmchcls,'R-free',ksmchsiz,0)) avg_free_size,
sum(decode(ksmchcls,'R-free',1,0)) free_count,
max(decode(ksmchcls,'R-free',ksmchsiz,0)) max_free_size,
sum(decode(ksmchcls,'R-free',0,ksmchsiz)) used_space,
avg(decode(ksmchcls,'R-free',0,ksmchsiz)) avg_used_size,
sum(decode(ksmchcls,'R-free',0,1)) used_count,
max(decode(ksmchcls,'R-free',0,ksmchsiz)) max_used_size from x$ksmspr
where ksmchcom not like '%reserved sto%') p,
(select sum(kghlurcn) requests, sum(kghlurmi) request_misses,
max(kghlurmz) last_miss_size, max(kghlurmx) max_miss_size,
sum(kghlunfu) request_failures, max(kghlunfs) last_failure_size,
max(kghlumxa) aborted_request_threshold, sum(kghlumer) aborted_requests,
max(kghlumes) last_aborted_size from x$kghlu) s;
SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ,
To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')||'k' "AVG SIZE"
FROM X$KSMSP GROUP BY KSMCHCLS;
set pagesize 80
set verify off
set heading off
set feedback off
set termout off
col sp_size format 999,999,999 justify right
col x_sp_used format 999,999,999 justify right
col sp_avail format 999,999,999 justify right
col sp_sz_pins format 999,999,999 justify right
col sp_no_pins format 999,999 justify right
col sp_no_obj format 999,999 justify right
col sp_sz_obj format 999,999 justify right
col sp_no_stmts format 999,999 justify right
col sp_sz_kept_chks format 999,999,999 justify right
col sp_no_kept_chks format 999,999 justify right
col val2 new_val x_sp_size noprint
select value val2
from v$parameter
where name='shared_pool_size'
/
col val2 new_val x_sp_used noprint
col val3 new_val x_sp_no_stmts noprint
select sum(sharable_mem+persistent_mem+runtime_mem) val2, count(*) val3
from v$sqlarea
/
col val2 new_val x_sp_no_obj noprint
col val3 new_val x_sp_sz_obj noprint
select decode(count(*),'',0,count(*)) val2,
decode(sum(sharable_mem),'',0,sum(sharable_mem)) val3
from v$db_object_cache
where type in ('PROCEDURE','FUNCTION','PACKAGE','PACKAGE BODY','TRIGGER')
/
col val2 new_val x_sp_avail noprint
select sum(ksmchsiz) val2 from x$ksmsp where ksmchcls = 'free'
/
col val2 new_val x_sp_no_kept_chks noprint
col val3 new_val x_sp_sz_kept_chks noprint
select decode(count(*),'',0,count(*)) val2,
decode(sum(sharable_mem),'',0,sum(sharable_mem)) val3
from v$db_object_cache
where kept='YES'
/
col val2 new_val x_sp_no_pins noprint
select count(*) val2
from v$session a, v$sqltext b
where a.sql_address||a.sql_hash_value = b.address||b.hash_value
/
col val2 new_val x_sp_sz_pins noprint
select sum(sharable_mem+persistent_mem+runtime_mem) val2
from v$session a,
v$sqltext b,
v$sqlarea c
where a.sql_address||a.sql_hash_value = b.address||b.hash_value and
b.address||b.hash_value = c.address||c.hash_value
/
set termout on
set heading off
ttitle -
center 'Shared Pool''s Library Cache Information' skip 2
select 'Size : '
||&x_sp_size sp_size,
'Number of shared cursors : '
||&x_sp_no_stmts sp_no_stmts,
'Used by shared cursors : '
||&x_sp_used,
'Number of programmatic constructs : '
||&x_sp_no_obj sp_no_obj,
'Used by programmatic constructs : '
||&x_sp_sz_obj sp_sz_obj,
'Available : '
||&x_sp_avail sp_avail,
'Kept object chunks : '
||&x_sp_no_kept_chks sp_no_kept_chks,
'Kept object chunks size : '
||&x_sp_sz_kept_chks sp_sz_kept_chks,
'Pinned statements : '
||&x_sp_no_pins sp_no_pins,
'Pinned statements size : '
||&x_sp_sz_pins sp_sz_pins
from dual
/
ttitle off
set heading on
set feedback on
set pages 1000 lines 120
col name for a60
col value for a30
select * from v$sgastat where pool like 'shared%' and name='free memory';
select '0 (<140)' BUCKET, KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10) "From",
count(*) "Count" , max(KSMCHSIZ) "Biggest",
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ<140
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 10*trunc(KSMCHSIZ/10)
UNION ALL
select '1 (140-267)' BUCKET, KSMCHCLS, KSMCHIDX,20*trunc(KSMCHSIZ/20) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 140 and 267
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 20*trunc(KSMCHSIZ/20)
UNION ALL
select '2 (268-523)' BUCKET, KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 268 and 523
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 50*trunc(KSMCHSIZ/50)
UNION ALL
select '3-5 (524-4107)' BUCKET, KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ between 524 and 4107
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 500*trunc(KSMCHSIZ/500)
UNION ALL
select '6+ (4108+)' BUCKET, KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000) ,
count(*) , max(KSMCHSIZ) ,
trunc(avg(KSMCHSIZ)) "AvgSize", trunc(sum(KSMCHSIZ)) "Total"
from x$ksmsp
where KSMCHSIZ >= 4108
and KSMCHCLS='free'
group by KSMCHCLS, KSMCHIDX, 1000*trunc(KSMCHSIZ/1000);
SELECT KSMCHCLS CLASS, COUNT(KSMCHCLS) NUM, SUM(KSMCHSIZ) SIZ,
To_char( ((SUM(KSMCHSIZ)/COUNT(KSMCHCLS)/1024)),'999,999.00')||'k' "AVG SIZE"
FROM X$KSMSP GROUP BY KSMCHCLS;
SELECT alloc_type, alloc_size, num_objs_flushed, object_loaded
FROM (SELECT ksmlrcom alloc_type,
ksmlrsiz alloc_size,
ksmlrnum num_objs_flushed,
ksmlrhon object_loaded,
RANK() OVER(ORDER BY ksmlrsiz DESC) AS order_ranking
FROM x$ksmlru
WHERE inst_id = USERENV('INSTANCE')
AND ksmlrsiz > 0)
WHERE order_ranking 400)
WHERE order_ranking 0
AND o.type LIKE 'JAVA%')
WHERE order_ranking 0
AND o.type in
('PACKAGE', 'FUNCTION', 'PROCEDURE', 'TRIGGER', 'SEQUENCE'))
WHERE order_ranking 0
AND o.type = 'CURSOR'
)
WHERE order_ranking