都是標量子查詢惹的禍
都是標量子查詢惹的禍
系統又報了一個跑的慢的sql語句,看圖就知道這個很恐怖的,已經跑了1天了,還需要跑6個月的時間,
把sql語句拿出來瞅瞅:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未結清餘額,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 餘額,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 額度區間,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end 小企業主標識
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end;
記憶體中的執行計劃:SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR('a50xqp45uy256',0,'advanced'));
簡單分析下,查詢的只有一個表,這就很奇怪了,不管這個表多麼大,也不可能跑這麼久的吧,但是有一個標量子查詢,根據經驗,如果標量子查詢的表資料量很多,或者關聯的列沒有索引的話,標量子查詢的效能將會是非常差的,我們由執行計劃也可以看出標量子查詢的表RISKPUBSC.MICRO_BUSI_DATABASE的關聯列沒有索引,至此,我們猜測可能是由於這個標量子查詢引起sql語句效能低下,猜測歸猜測,那我們實驗一下呢?
先去掉標量子查詢,然後執行一下,如下:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未結清餘額,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 餘額,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 額度區間/*,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end 小企業主標識*/
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end/*,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end*/
執行一下:,發現這個sql很快的,大約274秒,也就是5分鐘左右,好吧,看來真是標量子查詢惹的禍,,,,,,那我們首先建立索引看看效能如何呢?
create index ind_MICRO_BUSI_DATABASE_acct on RISKPUBSC.MICRO_BUSI_DATABASE(acct) NOLOGGING parallel 8;
alter index ind_MICRO_BUSI_DATABASE_acct NOPARALLEL;
然後再重新執行一下之前的語句:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未結清餘額,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 餘額,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 額度區間,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end 小企業主標識
from riskpubstrategy.lwt_ambs_cc_201406 A
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN EXISTS (SELECT 1 FROM RISKPUBSC.MICRO_BUSI_DATABASE B WHERE STATUS_CURR IS NOT NULL AND A.ACCT= B.ACCT) THEN '小企業主'
else '非小企業主' end;
加了索引後其執行計劃:
發現效能比較好,,大約就8分鐘搞定,,,,,,至此其實這個sql加個索引就算優化好了,但是根據經驗,如果標量子查詢的表的資料量超大(一般認為超過100W,不是絕對的)的時候,而且關聯的列又不是主鍵的時候,標量子查詢的效能其實不是怎麼好,,,,,,好吧,那我們來看看他們的資料量如何呢?????
首先看看資料量,
SELECT v.TABLE_NAME,
v.BLOCKS,
v.table_size2,
v.NUM_ROWS,
v.LAST_ANALYZED
FROM vw_table_lhr v
WHERE v.TABLE_NAME IN ('LWT_AMBS_CC_201406',
'MICRO_BUSI_DATABASE');
發現標量子查詢的表是有點大,300W的資料量,如果是標量子查詢的話,大約要對標量子查詢的表(MICRO_BUSI_DATABASE)掃描1988W次,相當於NL連線了,所以這個效能不怎麼好,那麼就修改為左外連線唄,修改後的sql:
--201406 XX
select SUM(UNPOSTING_AMT_CYC) 分期未結清餘額,
SUM(greatest(nvl(BAL_TOTAL,0),0)) 餘額,COUNT(1),DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end 額度區間,
CASE WHEN b.ACCT is not null THEN '小企業主'
else '非小企業主' end 小企業主標識
from riskpubstrategy.lwt_ambs_cc_201406 A
LEFT outer join RISKPUBSC.MICRO_BUSI_DATABASE B
on (a.ACCT=b.ACCT)
group by DELQ_LEVEL_CI,city,FQ_TOTAL,
CASE WHEN greatest(nvl(BAL_TOTAL,0),0) < 500 THEN '1.[0,500)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 500
AND greatest(nvl(BAL_TOTAL,0),0) < 5000 THEN '2.[500,5000)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 5000
AND greatest(nvl(BAL_TOTAL,0),0) < 20000 THEN '3.[5000,2W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 20000
AND greatest(nvl(BAL_TOTAL,0),0) < 50000 THEN '4[2W,5W)'
WHEN greatest(nvl(BAL_TOTAL,0),0) >= 50000 THEN '5[5W,+)'
else '其他' end,
CASE WHEN b.ACCT is not null THEN '小企業主'
else '非小企業主' end
執行計劃:
時間呢?????尼瑪,,,,,,,這麼快,,,,,由6個月轉換為44秒,,,,,,又是由自行車的速度到飛船的速度呀,,,,,,飛躍吧,,,,,,,,
這裡細心的人會發現,我去掉了b.STATUS_CURR is not null 這個條件,因為我查詢過這個列本身就沒有為空的資料,,,,,這個又是個小小的優化喲,,,,,
最後簡單總結下:
標量子查詢的使用是有條件的:
①標量子查詢的涉及的表的關聯列必須要有索引或者關聯列是主鍵列
②標量子查詢的涉及的表的資料量不能太大,資料量太大即使關聯列有索引依然很慢的
③對同一個表不能有多個標量子查詢,這樣效能也不會很高
第三個注意事項是神馬意思???????簡單舉個例子:
Select (select a from taba t1 where t1.id=t.id) a,
(select b from taba t1 where t1.id=t.id) b,
(select c from taba t1 where t1.id=t.id) c
……….
From tabb ;
這種情況大家還是修改為左外連線比較快。。。。。。如果taba稍微大點的話就會導致到大表的多次掃描,這個嚴重影響效能。。。。。。。。。