• CBO为什么不走索引?


    原帖是Itpub上的网友提出一个CBO为什么不走索引的问题, 该问题的演示如下:  
    SQL> create table maclean1 as select * from dba_objects;
    
    Table created.
    
    SQL> update maclean1 set status='INVALID' where owner='MACLEAN';
    
    2 rows updated.
    
    SQL> commit;                                       
    
    Commit complete.
    
    SQL> create index ind_maclean1 on maclean1(status);
    
    Index created.
    
    SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN1',cascade=>true);
    
    PL/SQL procedure successfully completed.
    
    SQL> explain plan for select * from maclean1 where status='INVALID';
    
    Explained.
    
    SQL> set linesize 140 pagesize 1400
    SQL> select * from table(dbms_xplan.display());
    
    PLAN_TABLE_OUTPUT
    ---------------------------------------------------------------------------
    Plan hash value: 987568083
    
    ------------------------------------------------------------------------------
    | Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
    ------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT  |          | 11320 |  1028K|    85   (0)| 00:00:02 |
    |*  1 |  TABLE ACCESS FULL| MACLEAN1 | 11320 |  1028K|    85   (0)| 00:00:02 |
    ------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       1 - filter("STATUS"='INVALID')
    
    13 rows selected.
    
    10053 trace 
    
    Access path analysis for MACLEAN1
    ***************************************
    SINGLE TABLE ACCESS PATH
      Single Table Cardinality Estimation for MACLEAN1[MACLEAN1]
      Column (#10): STATUS(
        AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.500000
      Table: MACLEAN1  Alias: MACLEAN1
        Card: Original: 22639.000000  Rounded: 11320  Computed: 11319.50  Non Adjusted: 11319.50
      Access Path: TableScan
        Cost:  85.33  Resp: 85.33  Degree: 0
          Cost_io: 85.00  Cost_cpu: 11935345
          Resp_io: 85.00  Resp_cpu: 11935345
      Access Path: index (AllEqRange)
        Index: IND_MACLEAN1
        resc_io: 185.00  resc_cpu: 8449916
        ix_sel: 0.500000  ix_sel_with_filters: 0.500000
        Cost: 185.24  Resp: 185.24  Degree: 1
      Best:: AccessPath: TableScan
             Cost: 85.33  Degree: 1  Resp: 85.33  Card: 11319.50  Bytes: 0
    
    可以从以上10053中看到因为没有直方图存在,所以这里的Density = 0.5 是从 1/ NDV 算得的
    也就意味着粗糙的统计信息显示STATUS='INVALID"的数据行占总行数的一半,
    所以优化器选择做全表扫描是有道理的
        以上符合"STATUS"='INVALID' condition的只有2行,且status列上建有索引,同时也使用了dbms_stats包收集表和索引上的统计信息,照理说CBO因该选择INDEX Range ind_maclean1,而避免全表扫描,但实际优化器opitimizer没有这样做。     实际上这个问题和统计信息收集时是否收集直方图有关系,只要收集了直方图,那么优化器就会了解到status='INVALID'条件仅有少量的card满足,具有良好的选择性:  
    [oracle@vrh4 ~]$ sqlplus  / as sysdba
    
    SQL*Plus: Release 11.2.0.2.0 Production on Mon Oct 17 19:15:45 2011
    
    Copyright (c) 1982, 2010, Oracle.  All rights reserved.
    
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    With the Partitioning, OLAP, Data Mining and Real Application Testing options
    
    SQL> select * from v$version;
    
    BANNER
    --------------------------------------------------------------------------------
    Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
    PL/SQL Release 11.2.0.2.0 - Production
    CORE    11.2.0.2.0      Production
    TNS for Linux: Version 11.2.0.2.0 - Production
    NLSRTL Version 11.2.0.2.0 - Production
    
    SQL> show parameter optimizer_fea
    
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      11.2.0.2
    
    SQL> select * from global_name;
    
    GLOBAL_NAME
    --------------------------------------------------------------------------------
    www.oracledatabase12g.com  & www.askmaclean.com
    
    SQL> drop table maclean;
    
    Table dropped.
    
    SQL>  create table maclean as select * from dba_objects;
    
    Table created.
    
    SQL> update maclean set status='INVALID' where owner='MACLEAN';
    
    2 rows updated.
    
    SQL>  commit;
    
    Commit complete.
    
    SQL> create index ind_maclean on maclean(status);
    
    Index created.
    
    SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN',cascade=>true, method_opt=>'FOR ALL COLUMNS SIZE 2');
    
    PL/SQL procedure successfully completed.
      这里我们仅收集2个bucket的直方图, 就足以让优化器做出正确选择了。 得益于Quest公司的Guy Harrison所写的一个列出FREQUENCY直方图信息的脚本,以下为该脚本:  
    rem
    rem Generate a histogram of data distribution in a column as recorded
    rem  in dba_tab_histograms
    rem
    rem Guy Harrison Jan 2010 : www.guyharrison.net
    rem
    rem hexstr function is from From http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:707586567563 
    
    set pagesize 10000
    set lines 120
    set verify off
    
    col char_value format a10 heading "Endpoint|value"
    col bucket_count format 99,999,999 heading "bucket|count"
    col pct format 999.99 heading "Pct"
    col pct_of_max format a62 heading "Pct of|Max value"
    rem col endpoint_value format 9999999999999 heading "endpoint|value" 
    
    CREATE OR REPLACE FUNCTION hexstr (p_number IN NUMBER)
        RETURN VARCHAR2
    AS
        l_str      LONG := TO_CHAR (p_number, 'fm' || RPAD ('x', 50, 'x'));
        l_return   VARCHAR2 (4000);
    BEGIN
        WHILE (l_str IS NOT NULL)
        LOOP
            l_return := l_return || CHR (TO_NUMBER (SUBSTR (l_str, 1, 2), 'xx'));
            l_str := SUBSTR (l_str, 3);
        END LOOP;
    
        RETURN (SUBSTR (l_return, 1, 6));
    END;
    /
    
    WITH hist_data AS (
    SELECT endpoint_value,endpoint_actual_value,
           NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
           endpoint_number,
           endpoint_number,
           endpoint_number
           - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
               bucket_count
    FROM dba_tab_histograms
    JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
    WHERE     owner = '&owner'
          AND table_name = '&table'
          AND column_name = '&column'
          AND histogram='FREQUENCY')
    SELECT nvl(endpoint_actual_value,endpoint_value) endpoint_value ,
           bucket_count,
           ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
           RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
      FROM hist_data;
    
    WITH hist_data AS (
    SELECT endpoint_value,endpoint_actual_value,
           NVL(LAG (endpoint_value) OVER (ORDER BY endpoint_value),' ') prev_value,
           endpoint_number,
           endpoint_number,
           endpoint_number
           - NVL (LAG (endpoint_number) OVER (ORDER BY endpoint_value), 0)
               bucket_count
    FROM dba_tab_histograms
    JOIN dba_tab_col_statistics USING (owner, table_name,column_name)
    WHERE     owner = '&owner'
          AND table_name = '&table'
          AND column_name = '&column'
          AND histogram='FREQUENCY')
    SELECT hexstr(endpoint_value) char_value,
           bucket_count,
           ROUND(bucket_count*100/SUM(bucket_count) OVER(),2) PCT,
           RPAD(' ',ROUND(bucket_count*50/MAX(bucket_count) OVER()),'*') pct_of_max
      FROM hist_data
    ORDER BY endpoint_value;
      使用该脚本,我们可以更直观的感受FREQUENCY直方图信息:     这里dbms_stats 包收集到的STATUS='INVALID' bucket count=9 percent = 0.04 ,可以和之后的10053 trace中的信息对比以下:  
    SQL> explain plan for select * from maclean where status='INVALID';
    
    Explained.
    
    SQL>  select * from table(dbms_xplan.display());
    
    PLAN_TABLE_OUTPUT
    -------------------------------------
    Plan hash value: 3087014066
    
    -------------------------------------------------------------------------------------------
    | Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
    -------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT            |             |     9 |   837 |     2   (0)| 00:00:01 |
    |   1 |  TABLE ACCESS BY INDEX ROWID| MACLEAN     |     9 |   837 |     2   (0)| 00:00:01 |
    |*  2 |   INDEX RANGE SCAN          | IND_MACLEAN |     9 |       |     1   (0)| 00:00:01 |
    -------------------------------------------------------------------------------------------
    
    Predicate Information (identified by operation id):
    ---------------------------------------------------
    
       2 - access("STATUS"='INVALID')
    以上可以看到只要收集了直方图CBO就会认识到满足STATUS='INVALID'的cardnality很少 , 该条件具有良好的选择性 ,使用index range scan而非Full table scan。 我们进一步来看看有直方图情况下的10053 trace:
    SQL> alter system flush shared_pool;
    
    System altered.
    
    SQL> oradebug setmypid;
    Statement processed.
    
    SQL> oradebug event 10053 trace name context forever ,level 1;
    Statement processed.
    
    SQL> explain plan for select * from maclean where status='INVALID';
    
    Explained.
    
    SINGLE TABLE ACCESS PATH
      Single Table Cardinality Estimation for MACLEAN[MACLEAN]
      Column (#10):
        NewDensity:0.000199, OldDensity:0.000022 BktCnt:22640, PopBktCnt:22640, PopValCnt:2, NDV:2
    
      这里的NewDensity= bucket_count / SUM(bucket_count) /2
    
       Column (#10): STATUS(
        AvgLen: 7 NDV: 2 Nulls: 0 Density: 0.000199
        Histogram: Freq  #Bkts: 2  UncompBkts: 22640  EndPtVals: 2
      Table: MACLEAN  Alias: MACLEAN
     Card: Original: 22640.000000 Rounded: 9 Computed: 9.00 Non Adjusted: 9.00
      Access Path: TableScan
        Cost:  85.30  Resp: 85.30  Degree: 0
          Cost_io: 85.00  Cost_cpu: 10804625
          Resp_io: 85.00  Resp_cpu: 10804625
      Access Path: index (AllEqRange)
        Index: IND_MACLEAN
        resc_io: 2.00  resc_cpu: 20763
        ix_sel: 0.000398  ix_sel_with_filters: 0.000398
        Cost: 2.00  Resp: 2.00  Degree: 1
      Best:: AccessPath: IndexRange
      Index: IND_MACLEAN
             Cost: 2.00  Degree: 1  Resp: 2.00  Card: 9.00  Bytes: 0
      上例中我们手动指定收集2 bucket的直方图后CBO优化器才能做出正确的选择,那么岂不是要人工干预来收集列的统计信息,默认的dbms_stats.DEFAULT_METHOD_OPT方式不能为我们提供有效的直方图收集方式吗? 实际上dbms_stats的自动决定直方图的收集与否及收集的桶数受到col_usage$基本中列充当predicate的历史记录影像,关于col_usage$详见<了解你所不知道的SMON功能(四):维护col_usage$字典基表>。   假设在统计表上信息的dbms_stats存储过程执行之前,col_usage$中已经存有表上相关列充当predicate的记录,那么dbms_stats存储过程就会考虑为该列收集直方图信息, 如:  
    SQL> drop table maclean;
    
    Table dropped.
    
    SQL>  create table maclean as select * from dba_objects;
    
    Table created.
    
    SQL> update maclean set status='INVALID' where owner='MACLEAN';
    
    2 rows updated.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> create index ind_maclean on maclean(status);
    
    Index created.
    
    使用dbms_stats默认method_opt收集maclean表
    
    SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
    
    PL/SQL procedure successfully completed.
    
    @histogram.sql 
    
    Enter value for owner: SYS
    old  12:    WHERE owner = '&owner'
    new  12:    WHERE owner = 'SYS'
    Enter value for table: MACLEAN
    old  13:      AND table_name = '&table'
    new  13:      AND table_name = 'MACLEAN'
    Enter value for column: STATUS
    old  14:      AND column_name = '&column'
    new  14:      AND column_name = 'STATUS'
    
    no rows selected
    
    因为缺少col_usage$列使用信息,所以依然没有收集status列的直方图
    
        declare
        begin
        for i in 1..500 loop
    	execute immediate ' alter system flush shared_pool';
    	DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
        execute immediate 'select count(*)  from maclean where status=''INVALID'' ' ;
        end loop;
        end;
        /
    
    PL/SQL procedure successfully completed.
    
    SQL> select obj# from obj$ where name='MACLEAN';
    
          OBJ#
    ----------
         97215
    SQL> select * from  col_usage$ where  OBJ#=97215;
    
           OBJ#    INTCOL# EQUALITY_PREDS EQUIJOIN_PREDS NONEQUIJOIN_PREDS RANGE_PREDS LIKE_PREDS NULL_PREDS TIMESTAMP
    ---------- ---------- -------------- -------------- ----------------- ----------- ---------- ---------- ---------
         97215          1              1              0                 0           0          0          0 17-OCT-11
         97215         10            499              0                 0           0          0          0 17-OCT-11
    
    SQL> exec dbms_stats.gather_table_stats('SYS','MACLEAN');
    
    PL/SQL procedure successfully completed.
    
    @histogram.sql 
    
    Enter value for owner: SYS
    Enter value for table: MACLEAN
    Enter value for column: STATUS
    
    Endpoint        bucket         Pct of
    value            count     Pct Max value
    ---------- ----------- ------- --------------------------------------------------------------
    INVALI               2     .04
    VALIC3           5,453   99.96  *************************************************
  • 相关阅读:
    java Servlet小结
    Java 自定义客户端与服务器
    JAVA IO流总结
    java udp与tcp
    tomcat作为服务器的配置
    Linux
    Git -- 如何删除本地仓库
    ASP.NET Core 基础 Startup 类
    ASP.NET Core解说之Middleware(中间件)
    一、Redis安装 Redis学习记录
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2968041.html
Copyright © 2020-2023  润新知