当收到告警信息ORA-01652: unable to extend temp segment by 128 in tablespace xxxx 时,如何Troubleshooting ORA-1652这样的问题呢? 当然一般xxx是临时表空间,也有可能是用户表空间。
我们先来模拟一下这个情况,在两个会话窗口执行下面SQL语句,这个视图比较特殊(因为比较懒,不想去构造一个大量消耗临时段的SQL,便使用手头的一个案例脚本),它里面有一个DISTINCT操作会消耗TEMP表空间中大量的临时段
SQL> select count(*) from v_ies_go_information;
开启两个会话窗口执行上面这个SQL,此时这两个会话会耗大量临时段,那么你用下面SQL语句就能捕获到这个SQL,如下所示:
For 8.1.7 to 9.2
SELECT A.USERNAME, A.SID, A.SERIAL#, A.OSUSER, B.TABLESPACE, B.BLOCKS, C.SQL_TEXT
FROM V$SESSION A, V$SORT_USAGE B, V$SQLAREA C
WHERE A.SADDR = B.SESSION_ADDR
AND C.ADDRESS= A.SQL_ADDRESS
AND C.HASH_VALUE = A.SQL_HASH_VALUE
ORDER BY B.TABLESPACE, B.BLOCKS;
For 10.1 and above:
COL USERNAME FOR A16;
COL OSUSER FOR A16;
COL TABLESPACE FOR A10;
COL SQL_TEXT FOR A160;
SELECT A.USERNAME, A.SID, A.SERIAL#, A.OSUSER, B.TABLESPACE, B.BLOCKS, C.SQL_TEXT
FROM GV$SESSION A, GV$TEMPSEG_USAGE B, GV$SQLAREA C
WHERE A.SADDR = B.SESSION_ADDR
AND C.ADDRESS= A.SQL_ADDRESS
AND C.HASH_VALUE = A.SQL_HASH_VALUE
ORDER BY B.TABLESPACE, B.BLOCKS;
当然消耗临时表空间的BLOCKS是一直变化的,下面只是其中一次查询结果的截图
当然这个也可以通过下面SQL查询当前消耗TEMP临时段的SQL_ID以及具体大小信息。这些信息都是实时变化的。
SQL> SELECT SQL_ID,SUM(BLOCKS) FROM GV$TEMPSEG_USAGE GROUP BY SQL_ID ORDER BY 2 DESC;
SQL_ID SUM(BLOCKS)
------------- -----------
cw4d8h5fudg6b 456704
SQL> SELECT TABLESPACE_NAME,TOTAL_BLOCKS,USED_BLOCKS,FREE_BLOCKS FROM V$SORT_SEGMENT;
TABLESPACE_NAME TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMPSCM2 1048320 506368 541952
SQL> SELECT TABLESPACE_NAME,TOTAL_BLOCKS,USED_BLOCKS,FREE_BLOCKS FROM V$SORT_SEGMENT;
TABLESPACE_NAME TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMPSCM2 1048320 1030144 18176
在另外一个窗口,不时执行下面SQL语句观察临时表空间的消耗使用情况,也能看到临时表空间的消耗变化情况, 如下所示:
SELECT D.TABLESPACE_NAME,
SPACE "SUM_SPACE(M)",
BLOCKS "SUM_BLOCKS",
USED_SPACE "USED_SPACE(M)",
ROUND(NVL(USED_SPACE, 0) / SPACE * 100, 2) "USED_RATE(%)",
SPACE - USED_SPACE "FREE_SPACE(M)"
FROM (SELECT TABLESPACE_NAME,
ROUND(SUM(BYTES) / ( 1024 * 1024 ), 2) SPACE,
SUM(BLOCKS) BLOCKS
FROM DBA_TEMP_FILES
GROUP BY TABLESPACE_NAME) D,
(SELECT TABLESPACE,
ROUND(SUM(BLOCKS * 8192) / ( 1024 * 1024 ), 2) USED_SPACE
FROM V$SORT_USAGE
GROUP BY TABLESPACE) F
WHERE D.TABLESPACE_NAME = F.TABLESPACE(+)
AND D.TABLESPACE_NAME='TEMPSCM2'
但是很多时候,当我们收到告警日志的告警邮件时,其实该SQL语句其实已经结束了。就像我这个测试会话中,如果已经收到ORA-1652 错误提示,其实会话已经结束,返回错误提示了。
Mon Aug 07 22:23:40 CST 2017
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPSCM2
Mon Aug 07 22:23:40 CST 2017
ORA-1652: unable to extend temp segment by 128 in tablespace TEMPSCM2
此时你用上面SQL其实已经不能捕获到相关信息了,因为PMON已经释放、回收了这些会话占用的临时段,如下所示,测试环境已经查不到任何信息,如果是生产环境,那么有可能查到是不准确的信息(查到的是非引起问题的SQL)。上面只适合查询当前临时表空间的使用情况,而不适合用来追查已经出现的ORA-1652错误。
那么此时我们应该怎么办呢? 其实我们可以使用ASH报告来帮忙定位消耗了大量临时段的SQL语句,如果收到ORA-01652告警后,最好及时生成一个快照,然后根据告警日志里面ORA-01652出现的时间,生成ASH报表,例如,此次试验ORA-01652出错的时间为22:23:40,那么我们生成22:20 ~ 22:25这个时间段的ASH报告。当然这个时间适当调整,尽量缩小范围,可以精准定位问题SQL。
SQL> @?/rdbms/admin/ashrpt.sql
然后从ASH报告的TOP SQL里面找到对应的TOP SQL的SQL ID,然后使用awrsqrpt报告找到具体SQL的执行计划, 如下所示, 然后分析SQL是否耗用了大量的临时段,当然生产环境肯定会复杂很多,TOP SQL里面肯定有多个,我们需要仔细甄别。这个分析也是一个耗时费力的体力活,所以上述ASH报告的时间段非常关键。
如下所示,通过awrsqrpt找到对应SQL_ID的具体执行计划,发现HASH UNIQUE这个DISTINCT操作使用了接近3G的临时段排序。再加上一些其他的操作需要消耗临时段,所以两个会话的同时执行就引起了ORA-1652的错误。
使用ASH报告基本上都能定位到具体消耗大量临时段的SQL语句,但是这个分析,有时候需要耗费很长时间,在How Can Temporary Segment Usage Be Monitored Over Time? (文档 ID 364417.1)里面介绍了如何监控临时段的使用情况。如下所示:
-- Create a table to hold your temporary space monitoring
-- 最好根据具体情况放入一个表空间,不要放入系统表空间
CREATE TABLE MONITOR_TEMP_SEG_USAGE
(
DATE_TIME DATE,
USERNAME VARCHAR2(30),
SID VARCHAR2(6),
SERIAL# VARCHAR2(6),
OS_USER VARCHAR2(30),
SPACE_USED NUMBER,
SQL_TEXT VARCHAR2(1000)
);
--创建存储过程,将消耗临时段超过阀值的SQL插入MONITOR_TEMP_SEG_USAGE
CREATE OR REPLACE PROCEDURE MONITOR_TEMP_SEG_USAGE_INSERT IS
BEGIN
INSERT INTO MONITOR_TEMP_SEG_USAGE
SELECT sysdate,a.username, a.sid, a.serial#, a.osuser, b.blocks, c.sql_text
FROM v$session a, v$sort_usage b, v$sqlarea c
WHERE b.tablespace = 'TEMP' --输入具体临时表空间
AND a.saddr = b.session_addr
AND c.address= a.sql_address
AND c.hash_value = a.sql_hash_value
AND b.blocks*(select block_size from dba_tablespaces where tablespace_name = b.tablespace) > 1024;
COMMIT;
END;
/
--创建作业,每5分钟运行一次,捕获消耗临时段超过阀值的SQL语句。
SQL> SELECT JOB FROM DBA_JOBS;
JOB
----------
141
142
BEGIN
DBMS_JOB.ISUBMIT(JOB => 20,
WHAT => 'MONITOR_TEMP_SEG_USAGE_INSERT;',
NEXT_DATE => SYSDATE,
INTERVAL => 'SYSDATE + (5/1440)');
COMMIT;
END;
/
另外,ORACLE 11.2 或后面的版本,可以使用下面SQL 语句查询出消耗临时段超过一定阀值的SQL语句,这样基本也能通过控制条件找到引起ORA-01652错误的SQL
SELECT SQL_ID,MAX(TEMP_SPACE_ALLOCATED)/(1024*1024*1024) GIG
FROM DBA_HIST_ACTIVE_SESS_HISTORY
WHERE
SAMPLE_TIME > SYSDATE-2 AND
TEMP_SPACE_ALLOCATED > (1024*1024*1024)
GROUP BY SQL_ID ORDER BY SQL_ID;
上面的一些介绍,基本已经涵盖如何Troubleshooting ORA-1652这个问题了,那么下面介绍一下ORA-1652出现的场景,其实这个对理解ORA-1652出现的前因后果非常有帮助!
EXAMPLE 1:
Temporary tablespace TEMP is being used and is 50gb in size (a recommended minimum for 11g)
TIME 1 : Session 1 starts a long running query
TIME 2 : Session 2 starts a query and at this point in time Session 1 has consumed 48gb of TEMP's free space
TIME 3 : Session 1 and Session 2 receive an ORA-1652 because the tablespace has exhausted of of its free space
Both sessions fail .. and all temp space used by the sessions are freed (the segments used are marked FREE for reuse)
TIME 4 : SMON cleans up the temporary segments used by Session 1 and Session 2 (deallocates the storage)
TIME 5 : Queries are run against the views V$SORTSEG_USAGE or V$TEMSEG_USAGE and V$SORT_SEGMENT ... and it is found that no space is being used (this is normal)
EXAMPLE 2:
Permanent tablespace INDEX_TBS is being used and has 20gb of space free
TIME 1 : Session 1 begins a CREATE INDEX command with the index stored in INDEX_TBS
TIME 2 : Session 1 exhausts all of the free space in INDEX_TBS as a result the CREATE INDEX abends
TIME 3 : SMON cleans up the temporary segments that were used to attempt to create the index
TIME 4 : Queries are run against the views V$SORTSEG_USAGE or V$TEMSEG_USAGE ... and it is found that the INDEX_TBS has no space used (this is normal)
In some cases, you may find that the ORA-1652 is not reported for a temporary tablespace, but a permanent one. This is not an abnormal behaviour and it can occur for example while creating or dropping objects like tables and indexes in permanent tablespaces. Reference : Note 19047.1 - OERR: ORA 1652 unable to extend temp segment by %s in tablespace %s
In such cases the following note will be of use :
Note 100492.1 - ORA-01652: Estimate Space Needed to CREATE INDEX
If the tablespace in which the TEMPORARY segment resides is of type PERMANENT, also check that the following events are not set in the initialization parameter file:
event="10061 trace name context forever, level 10"
event="10269 trace name context forever, level 10"
If they are set, unset them and restart database.
These two events prevent SMON from cleaning up.
Reference : Note 1039341.6 - Temporary Segments Are Not Being De-Allocated After a Sort
最后就是要给出一个解决方案,对于ORA-01652 这个错误有两个解决方案:
1: 如果临时表空间确实比较小,那么必须扩展临时表空间,增加临时数据文件或设置现有临时数据文件自动扩展。
2: 优化消耗大量临时段的SQL语句。减少临时段的消耗。
另外在RAC环境中,情况又有所不同,可以参考NOTE:280578.1 - Troubleshooting ORA-1652 Errors in RAC
参考资料:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=144013041361565&id=793380.1&displayIndex=4&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_402
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=147317477297631&id=100492.1&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_672
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=147331987541915&id=793380.1&_afrWindowMode=0&_adf.ctrl-state=sucf6uzjm_721