• [20191125]探究等待事件的本源.txt


    [20191125]探究等待事件的本源.txt

    --//当工作中遇到oracle的性能问题时,查看awr报表提供很好的解决问题途径.但是有时候很容易想当然.
    --//比如以前我一看到 log file sync等待事件就很主观的认为redo 磁盘IO不行,实际上真实的情况可能有许多原因.
    --//比如提交进程太多,cpu调度不过来.总之需要wait chains确定问题的本源.
    --//上午看了大师Tanel Poder的dash_wait_chains.sql与ash_wait_chains.sql脚本,简单学习它的使用.
    --//dash_wait_chains.sql 是基于DBA_HIST_ACTIVE_SESS_HISTORY视图.而ash_wait_chains.sql基于V$ACTIVE_SESSION_HISTORY视图.
    --//二者命令执行格式差不多,仅仅学习ash_wait_chains.sql就ok了.

    --//实际上根本不需要学习与记忆,看看脚本的开头就知道如何执行与使用:
    -- Usage:
    --     @ash_wait_chains <grouping_cols> <filters> <fromtime> <totime>
    --
    -- Example:
    --     @ash_wait_chains username||':'||program2||event2 session_type='FOREGROUND' sysdate-1/24 sysdate
    --
    -- Other:
    --     This script uses only the in-memory V$ACTIVE_SESSION_HISTORY, use
    --     @dash_wait_chains.sql for accessiong the DBA_HIST_ACTIVE_SESS_HISTORY archive
    --
    --     Oracle 10g does not  have the BLOCKING_INST_ID column in ASH so you'll need
    --     to comment out this column in this script. This may give you somewhat
    --     incorrect results in RAC environment with global blockers.
    --
    --------------------------------------------------------------------------------

    --//我做一点点简单的修改:
    COL wait_chain FOR A300 WORD_WRAP
    --//修改为:
    COL wait_chain FOR A200 WORD_WRAP

    --//我的工作环境最大显示宽度是271列.如果设置再小,字体显示就太小,我同事许多设置才237.
    $ echo $COLUMNS
    271

    --//因为我还需要管理10g的数据库,根据前面的说明,可以拷贝脚本命名为ash_wait_chains10g.sql.
    --//注解如下,当然对于rac环境可能不正确.
    --         AND PRIOR d.blocking_inst_id = d.inst_id

    --//作者还重新定义字段program,event为program2,event2.这样显示更加直观,占用宽度更小一些.
    --//以下是测试部分:

    1.环境:
    qqqqqqqqqqq> @ ver1
    PORT_STRING         VERSION    BANNER
    ------------------- ---------- ----------------------------------------------------------------------------
    x86_64/Linux 2.4.xx 11.2.0.4.0 Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    qqqqqqqqqqq> @ tpt/ash/ash_wait_chains  program2||':'||event2 1=1 trunc(sysdate)+9/24 trunc(sysdate)+10/24
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -------------------------------------------------------------------
      33%        4362         .1 -> (XXXYYY.EXE) :ON CPU
      10%        1368          0 -> (NSSn) :LNS wait on SENDREQ
      10%        1323          0 -> (LGWR) :LGWR-LNS wait on channel
       6%         735          0 -> (XXXYYY.EXE) :log file sync  -> (LGWR) :LGWR-LNS wait on channel
       5%         605          0 -> (CAPAA-PIPE) :ON CPU
       5%         604          0 -> (XXXYYY.exe) :ON CPU
       4%         573          0 -> (wnwp.exe) :ON CPU
       3%         378          0 -> (DIAn) :ON CPU
       2%         274          0 -> (sqlplus) :ON CPU
       2%         246          0 -> (httpd.exe) :ON CPU
       2%         230          0 -> (PSPn) :ON CPU
       1%         115          0 -> (CKPT) :ON CPU
       1%         111          0 -> (LMSn) :ON CPU
       1%         111          0 -> (wnwp.exe) :log file sync  -> (LGWR) :LGWR-LNS wait on channel
       1%          89          0 -> (routine.exe) :ON CPU
       1%          87          0 -> (oracle) :ON CPU
       1%          86          0 -> (sqlplus) :control file sequential read
       1%          83          0 -> (LGWR) :ON CPU
       1%          76          0 -> (routine.exe) :log file sync  -> (LGWR) :LGWR-LNS wait on channel
       1%          72          0 -> (ARCn) :ON CPU
       1%          69          0 -> (XXXYYY.EXE) :gc current block 2-way
       1%          68          0 -> (XXXYYY.EXE) :gc cr block 2-way
       0%          63          0 -> (LGWR) :log file parallel write
       0%          55          0 -> (LMON) :ON CPU
       0%          55          0 -> () :null event
       0%          47          0 -> (XXXYYY.EXE) :log file sync
       0%          46          0 -> (nnnn.exe) :log file sync  -> (LGWR) :LGWR-LNS wait on channel
       0%          39          0 -> (LMSn) :gcs log flush sync  -> (LGWR) :LGWR-LNS wait on channel
       0%          38          0 -> (XXXYYY.EXE) :direct path read
       0%          38          0 -> (XXXYYY.EXE) :gc cr block busy
    30 rows selected.

    --//开始不理解AAS表示什么.表示如下
    ROUND(COUNT(*) / ((CAST(&4 AS DATE) - CAST(&3 AS DATE)) * 86400), 1) AAS
    --//相当于每秒取样几次.
    --//参数3,参数4是时间范围,我建议使用类似例子的格式.你仅仅需要知道1/24表示1小时,1/1440表示1分钟就ok了.
    --//比如你需要查看当天8:15到9:10的情况,时间可以写成:
    trunc(sysdate)+8/24+15/1440 trunc(sysdate)+9/24+10/1440
    --//另外注意一点:V$ACTIVE_SESSION_HISTORY保留信息有限制,时间太久的可能查询不到.要改用dash_wait_chains.sql脚本查询.
    --//参数2可以加入一些过滤例子:session_type='FOREGROUND' 或者使用 1=1 表示全部.

    --//你可以看到这台服务器的log file sync等待事件实际上由于(LGWR) :LGWR-LNS wait on channel.
    --//问题在与安装配置dg时的参数:

    qqqqqqqqqqq> show parameter log_archive_dest_2
    NAME               TYPE   VALUE
    ------------------ ------ ----------------------------------------------------------------------------------------------------
    log_archive_dest_2 string service=rzdbra lgwr sync reopen=15 max_failure=10 net_timeout=30 optional noaffirm db_unique_name=XXXXXX
    --//对方配置了sync,这样在提交时要确定dg方已经写入日志文件才算提交.
    --//如果改成async,相关等待LNS wait on SENDREQ,LGWR-LNS wait on channel,log file sync都会消失或者减少.

    --//再来看看另外1个10g的数据库:
    WWWWWWWWW> @ ver1
    PORT_STRING         VERSION        BANNER
    ------------------- -------------- ----------------------------------------------------------------
    x86_64/Linux 2.4.xx 10.2.0.4.0     Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi

    WWWWWWWWW> @ tpt/ash/ash_wait_chains10g  program2||':'||event2 1=1 trunc(sysdate)+9/24 trunc(sysdate)+10/24
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- ------------------------------------------------------------------------
      30%        3602         .1 -> (xxxzzzz.exe) :db file sequential read
      19%        2244          0 -> (DBWn) :ON CPU
      11%        1321          0 -> (LGWR) :log file parallel write
       9%        1113          0 -> (xxxzzzz.exe) :ON CPU
       7%         890          0 -> (xxxzzzz.exe) :db file scattered read
       6%         784          0 -> (xxxzzzz.exe) :log file sync  -> (LGWR) :log file parallel write
       5%         633          0 -> (aaatobbbb.exe) :log file sync  -> (LGWR) :log file parallel write
       4%         490          0 -> (CKPT) :control file parallel write
       1%         106          0 -> (wnwp.exe) :db file sequential read
       1%          88          0 -> (JDBC Thin Client) :db file sequential read
       1%          86          0 -> (aaatobbbb.exe) :db file scattered read
       1%          82          0 -> (mnnn) :db file sequential read
       0%          59          0 -> (xxxzzzz.exe) :read by other session
       0%          55          0 -> (wnwp.exe) :ON CPU
       0%          54          0 -> (JDBC Thin Client) :ON CPU
       0%          45          0 -> (plsqldev.exe) :enq: TX - row lock contention
       0%          44          0 -> (sqlplus) :ON CPU
       0%          27          0 -> (wnwp.exe) :log file sync  -> (LGWR) :log file parallel write
       0%          27          0 -> (aaatobbbb.exe) :ON CPU
       0%          25          0 -> (xxxzzzz.exe) :SQL*Net more data from dblink
       0%          25          0 -> (mnnn) :ON CPU
       0%          24          0 -> (ARCn) :log file sequential read
       0%          19          0 -> (xxxzzzz.exe) :db file parallel read
       0%          13          0 -> (CTWR) :ON CPU
       0%          13          0 -> (aaatobbbb.exe) :db file sequential read
       0%          12          0 -> (Jnnn) :db file sequential read
       0%          10          0 -> (aaatobbbb.exe) :log file sync
       0%          10          0 -> (SMON) :db file sequential read
       0%          10          0 -> (ARCn) :ON CPU
       0%          10          0 -> (CJQn) :ON CPU
    30 rows selected.
    --//可以发现这台系统磁盘io已经不行.如果负荷在增加就不行.顺便说一下这台机器硬件配置很差.

    WWWWWWWWW> @ tpt/ash/ash_wait_chains10g  event2 1=1 trunc(sysdate)+7/24 sysdate
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -----------------------------------------------------------
      32%        4910         .1 -> db file sequential read
      32%        4887         .1 -> ON CPU
      12%        1896          0 -> log file sync  -> log file parallel write
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      11%        1761          0 -> log file parallel write
       7%        1078          0 -> db file scattered read
       4%         650          0 -> control file parallel write
       0%          59          0 -> read by other session
       0%          45          0 -> enq: TX - row lock contention
       0%          34          0 -> SQL*Net more data to client
       0%          34          0 -> SQL*Net more data from dblink
       0%          24          0 -> log file sequential read
       0%          23          0 -> db file parallel read
       0%          21          0 -> log file sync
       0%           7          0 -> log file switch completion
       0%           6          0 -> null event
       0%           5          0 -> change tracking file synchronous write
       0%           4          0 -> SQL*Net more data from client
       0%           4          0 -> control file sequential read
       0%           3          0 -> os thread startup
       0%           2          0 -> enq: CF - contention  -> control file parallel write
       0%           1          0 -> SQL*Net break/reset to client
       0%           1          0 -> Log archive I/O
       0%           1          0 -> latch: library cache
       0%           1          0 -> log file sync  -> ON CPU
    24 rows selected.

    --//注意看下划线,log file sync主要是由于log file parallel write太慢造成的,与前面的不同.这样定位问题就不会太盲目乱猜.

  • 相关阅读:
    CF763C Timofey and Remoduling
    CF762E Radio Stations
    CF762D Maximum Path
    CF763B Timofey and Rectangles
    URAL1696 Salary for Robots
    uva10884 Persephone
    LA4273 Post Offices
    SCU3037 Painting the Balls
    poj3375 Network Connection
    Golang zip压缩文件读写操作
  • 原文地址:https://www.cnblogs.com/lfree/p/11928763.html
Copyright © 2020-2023  润新知