• 收集oracle统计信息



    注意,无论gather stale还是gather auto,都要求进行监视。
    如果你执行一个alter table xxx monitoring命令,Oracle会用dba_tab_modifications视图来跟踪发生变动的表。
    这样一来,你就确切地知道,自从上一次分析统计数据以来,发生了多少次插入、更新和删除操作。
    SELECT * FROM Sys.Dba_Tab_Modifications WHERE Table_Owner = 'SCOTT';
    使用alter table xxx monitoring命令来实现Oracle表监视时,需要使用dbms_stats中的auto选项。
    auto选项根据数据分布以及应用程序访问列的方式(例如通过监视而确定的一个列的工作量)
    来创建直方图。使用method_opt=>’auto’类似于在dbms_stats的option参数中使用gather auto。
    begin
    dbms_stats.gather_schema_stats(ownname => 'SCOTT',
                                   estimate_percent => dbms_stats.auto_sample_size,
                                   method_opt => 'for all columns size auto',
                                   degree => 7);
    end;

    estimate_percent选项
    以下estimate_percent参数是一种比较新的设计,它允许Oracle的dbms_stats在收集统计数据时,自动估计要采样的一个segment的最佳百分比:
    estimate_percent => dbms_stats.auto_sample_size
    要验证自动统计采样的准确性,你可检视dba_tables sample_size列。一个有趣的地方是,在使用自动采样时,Oracle会为一个样本尺寸选择5到20的百分比。记住,统计数据质量越好,CBO做出的决定越好。

    method_opt选项
    dbms_stats的method_opt参数尤其适合在表和索引数据发生变化时刷新统计数据。method_opt参数也适合用于判断哪些列需要直方图(histograms)。
    某些情况下,索引内的各个值的分布会影响CBO是使用一个索引还是执行一次全表扫描的决策。例如,假如在where子句中指定的值的数量不对称,全表扫描就显得比索引访问更经济。
    如果你有一个高度倾斜的索引(某些值的行数不对称),就可创建Oracle直方图统计。但在现实世界中,出现这种情况的机率相当小。使用CBO时,最常见的错误之一就是在CBO统计中不必要地引入直方图。根据经验,只有在列值要求必须修改执行计划时,才应使用直方图。
    为了智能地生成直方图,Oracle为dbms_stats准备了method_opt参数。在method_opt子句中,还有一些重要的新选项,包括skewonly,repeat和auto:method_opt=>'for all columns size skewonly'
    method_opt=>'for all columns size repeat'
    method_opt=>'for all columns size auto'

    skewonly选项会耗费大量处理时间,因为它要检查每个索引中的每个列的值的分布情况。
    假如dbms_stat发现一个索引的各个列分布得不均匀,就会为那个索引创建直方图,帮助基于代价的SQL优化器决定是进行索引访问,还是进行全表扫描访问。例如,在一个索引中,假定有一个列在50%的行中,那么为了检索这些行,全表扫描的速度会快于索引扫描。
    --*************************************************************
    -- SKEWONLY option—Detailed analysis
    --
    -- Use this method for a first-time analysis for skewed indexes
    -- This runs a long time because all indexes are examined
    --*************************************************************
    begin
    dbms_stats.gather_schema_stats(ownname => 'SCOTT',
                                   estimate_percent => dbms_stats.auto_sample_size,
                                   method_opt => 'for all columns size skewonly',
                                   degree => 7);
    end;


    重新分析统计数据时,使用repeat选项,重新分析任务所消耗的资源就会少一些。使用repeat选项时,只会为现有的直方图重新分析索引,不再搜索其他直方图机会。定期重新分析统计数据时,你应该采取这种方式。
    --**************************************************************
    -- REPEAT OPTION - Only reanalyze histograms for indexes
    -- that have histograms
    --
    -- Following the initial analysis, the weekly analysis
    -- job will use the “repeat” option. The repeat option
    -- tells dbms_stats that no indexes have changed, and
    -- it will only reanalyze histograms for
    -- indexes that have histograms.
    --**************************************************************
    begin
    dbms_stats.gather_schema_stats(ownname => 'SCOTT',
                                   estimate_percent => dbms_stats.auto_sample_size,
                                   method_opt => 'for all columns size repeat',
                                   degree => 7);
    end;

    Oracle中关于表的统计信息是在数据字典中的,可以下SQL查询到:
    SELECT Table_Name,Num_Rows,Blocks,Empty_Blocks,Avg_Space,Chain_Cnt,Avg_Row_Len,Sample_Size,Last_Analyzed
    FROM Dba_Tables WHERE wner = 'SCOTT' ;

    这是对命令与工具包的一些总结
    1、对于分区表,建议使用DBMS_STATS,而不是使用Analyze语句。
    a) 可以并行进行,对多个用户,多个Table
    b) 可以得到整个分区表的数据和单个分区的数据。
    c) 可以在不同级别上Compute Statistics:单个分区,子分区,全表,所有分区 ,但不收集聚簇统计
    d) 可以倒出统计信息
    e) 可以用户自动收集统计信息
    2、DBMS_STATS的缺点
    a) 不能Validate Structure
    b) 不能收集CHAINED ROWS, 不能收集CLUSTER TABLE的信息,这两个仍旧需要使用Analyze语句。
    c) DBMS_STATS 默认不对索引进行Analyze,因为默认Cascade是False,需要手工指定为True
    3、对于External Table,Analyze不能使用,只能使用DBMS_STATS来收集信息。

    GATHER_TABLE_STATS
    ==========================
    DBMS_STATS.gather_table_stats
        (ownname varchar2,
         tabname varchar2,
         partname varchar2 default null,
         estimate_percent number default   to_estimate_percent_type(get_param('ESTIMATE_PERCENT')),
         block_sample boolean default FALSE,
         method_opt varchar2 default get_param('METHOD_OPT'),
         degree number default to_degree_type(get_param('DEGREE')),
         granularity varchar2 default get_param('GRANULARITY'),
         cascade boolean default to_cascade_type(get_param('CASCADE')),
         stattab varchar2 default null, statid varchar2 default null,
         statown varchar2 default null,
         no_invalidate boolean default to_no_invalidate_type(get_param('NO_INVALIDATE')),
         stattype varchar2 default 'DATA',
         force boolean default FALSE);

    参数说明:
    ownname:   要分析表的拥有者
    tabname:   要分析的表名.
    partname: 分区的名字,只对分区表或分区索引有用.
    estimate_percent:采样行的百分比,取值范围[0.000001,100],null为全部分析,不采样. 常量:DBMS_STATS.AUTO_SAMPLE_SIZE是默认值,由oracle决定最佳取采样值.
    block_sapmple:是否用块采样代替行采样.
    method_opt:    决定histograms信息是怎样被统计的.method_opt的取值如下:
    for all columns:统计所有列的histograms.
    for all indexed columns:统计所有indexed列的histograms.
    for all hidden columns:统计你看不到列的histograms
    for columns <list> SIZE <N> | REPEAT | AUTO | SKEWONLY:
                                                  统计指定列的histograms.N的取值范围[1,254]; R
                                                  EPEAT上次统计过的histograms;
                                                  AUTO由oracle决定N的大小;
                                                  SKEWONLY multiple end-points with the same value which is what we define by "there is skew in the data
    degree:              设置收集统计信息的并行度.默认值为null.
    granularity:Granularity of statistics to collect ,only pertinent if the table is partitioned.
    cascade:       是收集索引的信息.默认为falase.
    stattab        指定要存储统计信息的表,statid如果多个表的统计信息存储在同一个stattab中用于进行区分.statown存储统计信息表的拥有者.以上三个参数若不指定,统计信息会直接更新到数据字典.
    no_invalidate: Does not invalidate the dependent cursors if set to TRUE. The procedure invalidates the dependent cursors immediately if set to FALSE.
    force:         即使表锁住了也收集统计信息

    例子:
    execute dbms_stats.gather_table_stats(ownname => 'owner',
                                          tabname => 'table_name' ,
                                          estimate_percent => null ,
                                          method_opt => 'for all indexed columns' ,
                                          cascade => true);
    GATHER_INDEX_STATS
    ==========================
    BEGIN
    SYS.DBMS_STATS.GATHER_INDEX_STATS (OwnName => 'ABC',
                                       IndName => 'IDX_FUNC_ABC',
                                       Estimate_Percent => 10,
                                       Degree => SYS.DBMS_STATS.DEFAULT_DEGREE,
                                       No_Invalidate => FALSE);
    END;

    ---------------------------------------
    10g自动收集统计信息
    ---------------------------------------
    从10g开始,Oracle在建库后就默认创建了一个名为GATHER_STATS_JOB的定时任务,用于自动收集CBO的统计信息。
    这个自动任务默认情况下在工作日晚上10:00-6:00和周末全天开启。
    调用DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC收集统计信息。该过程首先检测统计信息缺失和陈旧的对象。然后确定优先级,再开始进行统计信息。

    可以通过以下查询这个JOB的运行情况:
    SELECT * FROM Dba_Scheduler_Jobs WHERE Job_Name = 'GATHER_STATS_JOB';
    其实同在10点运行的Job还有一个AUTO_SPACE_ADVISOR_JOB:
    SELECT Job_Name, Last_Start_Date FROM Dba_Scheduler_Jobs;

    JOB_NAME                       LAST_START_DATE
    ------------------------------ ------------------------------------
    AUTO_SPACE_ADVISOR_JOB         30-OCT-08 10.00.01.463000 PM +08:00
    GATHER_STATS_JOB               30-OCT-08 10.00.01.463000 PM +08:00

    然而这个自动化功能已经影响了很多系统的正常运行,晚上10点对于大部分生产系统也并非空闲时段。
    而自动分析可能导致极为严重的闩锁竞争,进而可能导致数据库Hang或者Crash。
    所以建议最好关闭这个自动统计信息收集功能:
    关闭及开启自动搜集功能,有两种方法,分别如下:
    方法一:
    exec dbms_scheduler.disable('SYS.GATHER_STATS_JOB');
    exec dbms_scheduler.enable('SYS.GATHER_STATS_JOB');
    方法二:
    alter system set "_optimizer_autostats_job"=false scope=spfile;
    alter system set "_optimizer_autostats_job"=true scope=spfile;

    ---------------------------------------
    查看统计
    ---------------------------------------
    表/索引/列上的统计
    DBA_TABLES
    DBA_OBJECT_TABLES
    DBA_TAB_STATISTICS
    DBA_TAB_COL_STATISTICS
    DBA_TAB_HISTOGRAMS
    DBA_INDEXES
    DBA_IND_STATISTICS
    DBA_CLUSTERS
    DBA_TAB_PARTITIONS
    DBA_TAB_SUBPARTITIONS
    DBA_IND_PARTITIONS
    DBA_IND_SUBPARTITIONS
    DBA_PART_COL_STATISTICS
    DBA_PART_HISTOGRAMS
    DBA_SUBPART_COL_STATISTICS
    DBA_SUBPART_HISTOGRAMS
    ---------------------------------------
    直方图统计
    ---------------------------------------
    直方图的类型存储在*TAB_COL_STATISTICS视图的HISTOGRAM列上。

    ------------------------------------------------------------------------------
    bde_last_analyzed.sql - Verifies CBO Statistics
    ------------------------------------------------------------------------------
    bde_last_analyzed.sql verifies the CBO statistics in the data dictionary for all tables, indexes, and partitions. It also validates the statistics on tables and indexes owned by 'SYS'.

    The 5 generated reports bde_last_analyzed_xxx.html, present the total of tables and indexes analyzed per module and per date.

    Script. bde_last_analyzed.sql provided in this Note can be used on any 8i, 9i, 10g, 11g or higher database, including Oracle Apps 11i and R12 instances

    如果是ERP数据库,则用APPS连接,否则用其他任何SYS权限用户连接都可以
    #sqlplus <user>/<pwd>
         SQL> START bde_last_analyzed.sql

    Review spool output files bde_last_analyzed_xxx.html files. Spool files get created on same directory from which this script. is executed. On NT, files may get created under $ORACLE_HOME/bin.

    If some modules have not been analyzed, or they have but not recently, these Apps objects must be analyzed using FND_STATS or coe_stats.sql if belonging to Oracle Apps. Otherwise use DBMS_STATS.
    If Oracle Apps, use corresponding concurrent program with an estimate of 10%, or execute equivalent FND_STATS procedure from SQL*Plus:
    SQL> exec fnd_stats.gather_schema_statistics('APPLSYS'); Where 'APPLSYS' is the module (schema) that requires new statistics.

    If only a few tables require to have their statistics gathered, use the corresponding concurrent program to gather stats by table, or execute equivalent FND_STATS procedure from SQL*Plus:
    SQL> exec fnd_stats.gather_table_stats('MRP','MRP_FORECAST_DATES');
    Where 'MRP' is the schema owner, and 'MRP_FORECAST_DATES' is the table name. This syntax is only for non-partitioned Tables.

    If any Partitioned Table requires its Global Stats being rebuilt, it is because at some point you gathered Stats on the table using a granularity of PARTITION. See second method below:
    begin
    dbms_stats.delete_table_stats(ownname => 'APPLSYS', tabname => 'WF_ITEM_ACTIVITY_STATUSES');
    fnd_stats.gather_table_stats (ownname => 'APPLSYS', tabname => 'WF_ITEM_ACTIVITY_STATUSES',
                                    granularity => 'DEFAULT');
    end;
    /

    Once you fix your stats, be sure to ALWAYS use the granularity of DEFAULT for partitioned tables.

    If you want to execute this bde_last_analyzed.sql script. against only one schema, modify DEF SCHEMA code line.


    ---------------------------------------
    分区表的统计信息实例
    --------------------------------------- 
    ORATEA ORACLE的统计信息在执行SQL的过程中扮演着非常重要的作用,而且ORACLE在表的各个层次都会有不同的统计信息,通过这些统计信息来描述表的,列的各种各样的统计信息。下面通过一个复合分区表来说明一些常见的和常见的统计信息。

    SQL>
    create table test
    partition by range(object_id)
    subpartition by hash(object_type) subpartitions 4
    (partition p1 values less than(10000),
    partition p2 values less than(20000),
    partition p3 values less than(30000),
    partition p4 values less than(maxvalue))
    as
    select * from dba_objects;

    表已创建。
    sql>
    BEGIN
    dbms_stats.gather_table_stats(ownname          => 'SCOTT',
                                    tabname          => 'TEST99',
                                    estimate_percent => 100,
                                    block_sample     => FALSE,
                                    method_opt       => 'FOR ALL COLUMNS SIZE 10',
                                    granularity      => 'ALL',
                                    cascade          => TRUE);
    END;

    1,表级的统计信息

    SQL> select table_name,num_rows,blocks,empty_blocks,avg_space from user_tables where table_name = 'TEST99';

    TABLE_NAME                       NUM_ROWS     BLOCKS EMPTY_BLOCKS AVG_SPACE
    ------------------------------ ---------- ---------- ------------ ----------
    TEST                                50705        788            0          0

    2,表上列的统计信息

    SQL> select table_name,column_name,num_distinct,density from user_tab_columns where table_name = 'TEST99';

    TABLE_NAME                     COLUMN_NAME                    NUM_DISTINCT    DENSITY
    ------------------------------ ------------------------------ ------------ ----------
    TEST                           OWNER                                    25 .365014295
    TEST                           OBJECT_NAME                           30275 .000039205
    TEST                           SUBOBJECT_NAME                          191 .015657993
    TEST                           OBJECT_ID                             50705 .000019722
    TEST                           DATA_OBJECT_ID                         4334 .000248075
    TEST                           OBJECT_TYPE                              42 .271207855
    TEST                           CREATED                                2305 .001608457
    TEST                           LAST_DDL_TIME                          2369 .001566737
    TEST                           TIMESTAMP                              2412 .001610251
    TEST                           STATUS                                    2 .000009861
    TEST                           TEMPORARY                                 2 .000009861
    TEST                           GENERATED                                 2 .000009861
    TEST                           SECONDARY                                 2 .000009861

    13 rows selected.

    3,表上列的直方图信息

    SQL>
    select table_name,column_name,endpoint_number,endpoint_value
    from user_tab_histograms
    where table_name = 'TEST'
    and column_name = 'OBJECT_ID';

    TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE
    ---------- ---------- --------------- --------------
    TEST       OBJECT_ID                0              2
    TEST       OBJECT_ID                1           5160
    TEST       OBJECT_ID                2          10587
    TEST       OBJECT_ID                3          15658
    TEST       OBJECT_ID                4          20729
    TEST       OBJECT_ID                5          25800
    TEST       OBJECT_ID                6          30870
    TEST       OBJECT_ID                7          35940
    TEST       OBJECT_ID                8          41089
    TEST       OBJECT_ID                9          46821
    TEST       OBJECT_ID               10          53497

    4,分区的统计信息

    SQL>
    select partition_name,num_rows,blocks,empty_blocks,avg_space
    from user_tab_partitions
    where table_name = 'TEST99';

    PARTITION_NAME    NUM_ROWS     BLOCKS EMPTY_BLOCKS AVG_SPACE
    --------------- ---------- ---------- ------------ ----------
    P1                    9581        140            0          0
    P2                    9973        164            0          0
    P3                   10000        158            0          0
    P4                   21151        326            0          0

    5,分区上列的统计信息

    SQL> select column_name,num_distinct,density,num_nulls
    from user_part_col_statistics
    where table_name = 'TEST'
    and partition_name = 'P1';

    COLUMN_NAME     NUM_DISTINCT    DENSITY NUM_NULLS
    --------------- ------------ ---------- ----------
    OWNER                      7 .000052187          0
    OBJECT_NAME             7412 .000156925          0
    SUBOBJECT_NAME            26 .47017301       9496
    OBJECT_ID               9581 .000104373          0
    DATA_OBJECT_ID          1765 .000664385       7780
    OBJECT_TYPE               34 .18494854          0
    CREATED                  913 .001977449          0
    LAST_DDL_TIME            994 .001882695          0
    TIMESTAMP                982 .001928775          0
    STATUS                     2 .000052187          0
    TEMPORARY                  2 .000052187          0
    GENERATED                  2 .000052187          0
    SECONDARY                  1 .000052187          0


    6,分区上列的直方图信息

    SQL> select column_name,bucket_number,endpoint_value
    from user_part_histograms
    where table_name = 'TEST'
    and partition_name = 'P1'
    and column_name = 'OBJECT_ID';

    COLUMN_NAME     BUCKET_NUMBER ENDPOINT_VALUE
    --------------- ------------- --------------
    OBJECT_ID                   0              2
    OBJECT_ID                   1           1005
    OBJECT_ID                   2           1963
    OBJECT_ID                   3           2921
    OBJECT_ID                   4           3888
    OBJECT_ID                   5           4859
    OBJECT_ID                   6           5941
    OBJECT_ID                   7           6899
    OBJECT_ID                   8           7885
    OBJECT_ID                   9           8864
    OBJECT_ID                  10           9999


    7,子分区的统计信息

    SQL> select subpartition_name,num_rows,blocks,empty_blocks
    from user_tab_subpartitions
    where table_name = 'TEST'
    and partition_name = 'P1';

    SUBPARTITION_NAME                NUM_ROWS     BLOCKS EMPTY_BLOCKS
    ------------------------------ ---------- ---------- ------------
    SYS_SUBP21                           3597         50            0
    SYS_SUBP22                           3566         52            0
    SYS_SUBP23                            637         11            0
    SYS_SUBP24                           1781         27            0

    8,子分区上的列的统计信息

    SQL> select column_name,num_distinct,density
    from user_subpart_col_statistics
    where table_name = 'TEST'
    and subpartition_name = 'SYS_SUBP21';
    COLUMN_NAME     NUM_DISTINCT    DENSITY
    --------------- ------------ ----------
    OWNER                      6 .000139005
    OBJECT_NAME             3595 .000278319
    SUBOBJECT_NAME             4 .014285714
    OBJECT_ID               3597 .000278009
    DATA_OBJECT_ID           155 .006451613
    OBJECT_TYPE                8 .000139005
    CREATED                  751 .002392334
    LAST_DDL_TIME            784 .002302524
    TIMESTAMP                768 .00235539
    STATUS                     1 .000139005
    TEMPORARY                  2 .000139005
    GENERATED                  2 .000139005
    SECONDARY                  1 .000139005

    9,子分区上的列的直方图信息

    SQL> select column_name,bucket_number,endpoint_value
    from user_subpart_histograms
    where table_name = 'TEST'
    and subpartition_name = 'SYS_SUBP21'
    and column_name = 'OBJECT_ID';
    COLUMN_NAME     BUCKET_NUMBER ENDPOINT_VALUE
    --------------- ------------- --------------
    OBJECT_ID                   0            208
    OBJECT_ID                   1           1525
    OBJECT_ID                   2           2244
    OBJECT_ID                   3           2892
    OBJECT_ID                   4           3252
    OBJECT_ID                   5           4047
    OBJECT_ID                   6           5238
    OBJECT_ID                   7           6531
    OBJECT_ID                   8           7661
    OBJECT_ID                   9           8474
    OBJECT_ID                  10           9998

    我们对这个复合分区分析之后产生了上面这九种不同层次的统计信息。CBO想要得要一个高效的执行计划需要如此多的统计信息.

  • 相关阅读:
    P3396 哈希冲突
    P3295 [SCOI2016]萌萌哒
    P2585 [ZJOI2006]三色二叉树
    Leetcode 1546 和为目标值的最大数目不重叠非空子数组数目 贪心前缀和
    Leetcode 200 岛屿数量 压缩路径并查集与DFS
    Leetcode 递增子序列 回溯去重
    机器学习sklearn(76):算法实例(三十三)回归(五)线性回归大家族(三)回归类的模型评估指标
    机器学习sklearn(75):算法实例(三十二)回归(四)线性回归大家族(二)多元线性回归LinearRegression
    机器学习sklearn(74):算法实例(三十一)回归(三)线性回归大家族(一)概述
    机器学习sklearn(73):算法实例(三十)分类(十七)SVM(八)sklearn.svm.SVC(七) SVC真实数据案例:预测明天是否会下雨
  • 原文地址:https://www.cnblogs.com/weixun/p/3011124.html
Copyright © 2020-2023  润新知