• [20191127]探究等待事件的本源4.txt


    [20191127]探究等待事件的本源4.txt

    --//昨天使用ash_wait_chains.sql脚本把各个生产库执行1遍,才发现我对一套系统性能理解错误.
    --//我一直以为这套系统存储有点问题,性能不是很好.
    1.环境:
    xxxxxx> @ ver1
    PORT_STRING                    VERSION        BANNER
    ------------------------------ -------------- --------------------------------------------------------------------------------
    x86_64/Linux 2.4.xx            11.2.0.4.0     Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

    2.分析:
    xxxxxx> @ tpt/ash/ash_wait_chains event2 1=1 trunc(sysdate)+8/24  sysdate
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -------------------------------------------------------------
      39%        2667         .5 -> ON CPU
      18%        1274         .2 -> LNS wait on SENDREQ
      18%        1249         .2 -> log file sync  -> LGWR-LNS wait on channel
      18%        1215         .2 -> LGWR-LNS wait on channel
       1%          62          0 -> log file sync
       1%          60          0 -> log file parallel write
       1%          46          0 -> control file sequential read
       1%          39          0 -> db file sequential read
       1%          36          0 -> log file sync  -> log file parallel write
       0%          29          0 -> gc cr block 2-way
       0%          24          0 -> gc current block 2-way
       0%          23          0 -> null event
       0%          21          0 -> direct path read
       0%          20          0 -> log file sync  -> ON CPU
       0%          18          0 -> gcs log flush sync  -> LGWR-LNS wait on channel
       0%          18          0 -> gc cr block busy
       0%          16          0 -> db file parallel write
       0%          12          0 -> SQL*Net more data to client
       0%          10          0 -> gc cr multi block request
       0%           9          0 -> gc current grant busy
       0%           7          0 -> Disk file operations I/O
       0%           6          0 -> ASM file metadata operation
       0%           5          0 -> log file sequential read
       0%           4          0 -> LGWR wait on LNS
       0%           4          0 -> log file switch completion  -> LGWR-LNS wait on channel
       0%           4          0 -> IPC send completion sync
       0%           4          0 -> control file parallel write
       0%           3          0 -> reliable message
       0%           3          0 -> log file sync  -> LGWR wait on LNS
       0%           2          0 -> CGS wait for IPC msg
    30 rows selected.

    xxxxxx> select sysdate from dual ;
    SYSDATE
    -------------------
    2019-11-27 09:27:51

    xxxxxx> @ tpt/ash/ash_wait_chains event2 "event='log file sync'" trunc(sysdate)+8/24  sysdate
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- ----------------------------------------------
      91%        1249         .2 -> log file sync  -> LGWR-LNS wait on channel
       5%          62          0 -> log file sync
       3%          36          0 -> log file sync  -> log file parallel write
       1%          20          0 -> log file sync  -> ON CPU
       0%           3          0 -> log file sync  -> LGWR wait on LNS

    --//实际上主要问题在于log_archive_dest_2参数设置不合理,采用sync参数.
    xxxxxx> show parameter log_archive_dest_2
    NAME               TYPE   VALUE
    ------------------ ------ ----------------------------------------------------------------------------------------------------
    log_archive_dest_2 string service=xxxxxx lgwr sync reopen=15 max_failure=10 net_timeout=30 optional noaffirm db_unique_name=xxxxx

    xxxxxx> show parameter log_archive_config
    NAME               TYPE   VALUE
    ------------------ ------ -----------
    log_archive_config string nodg_config

    --//实际上我第一次看应该是去年的春节前后,刚上线,当时这个问题没有严重,现在显得越来越明显.
    --//"可怕"的是我不能修改这个参数,这个所谓的dg是一个第三方安装的东西,根本不是什么dg,我一修改参数,对方的软件视乎就检测我的改动,自动重置回来.
    --//alter system set log_archive_dest_2="service=xxxxxx lgwr async reopen=15 max_failure=10 net_timeout=30 optional noaffirm db_unique_name=xxxxxx";
    --//btw:好像这个参数可以修改,也许我记错了,不能修改log_archive_config参数,等一段时间观察看看.

    --//看awr报表:
    Top 10 Foreground Events by Total Wait Time
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                                Tota    Wait   % DB           
    Event                                 Waits Time Avg(ms)   time Wait Class
    ------------------------------ ------------ ---- ------- ------ ----------
    DB CPU                                      1063           78.5           
    log file sync                        46,848 311.       7   23.0 Commit    
    control file sequential read         68,350 21.1       0    1.6 System I/O
    gc cr block 2-way                    17,827  8.2       0     .6 Cluster   
    gc current block 2-way               17,190  7.4       0     .5 Cluster   
    gc cr block busy                        218  6.8      31     .5 Cluster   
    SQL*Net more data to client         541,828  6.4       0     .5 Network   
    db file sequential read                 936    6       6     .4 User I/O  
    direct path read                      1,591  5.2       3     .4 User I/O  
    Disk file operations I/O             23,740  4.6       0     .3 User I/O  

    --//log file sync 平均等待7ms,到业务高峰可以到达13-14ms.我开始以为磁盘io不行,或者使用asm的原因.实际是就是上面的参数设置不合理.
    --//如果不小心选择优化磁盘IO,那就选择错误的优化方向.实际上这套系统cpu充足,磁盘性能也不差,典型的大马拉小车.
    --//主要问题在于应用软件垃圾(不仅仅指这个dg设置)!!也就是我以前提得良好的硬件掩盖拙劣的应用设计.

    --//再来看看control file sequential read等待事件:

    IOStat by Filetype summary           DB/Inst: zzzzz/zzzzz1  Snaps: 25477-25478
    -> 'Data' columns suffixed with M,G,T,P are in multiples of 1024
        other columns suffixed with K,M,G,T,P are in multiples of 1000
    -> Small Read and Large Read are average service times, in milliseconds
    -> Ordered by (Data Read + Write) desc

                    Reads:   Reqs   Data    Writes:  Reqs   Data      Small   Large
    Filetype Name   Data    per sec per sec Data    per sec per sec    Read    Read
    --------------- ------- ------- ------- ------- ------- ------- ------- -------
    Control File       8.3G    27.2  2.382M    228M     3.6   .064M     0.1     0.8
    Log File           206M     0.1   .058M    228M    20.4   .064M     1.5     6.0
    Data File           64M     2.2   .018M    235M     4.6   .066M     2.3     N/A
    Archive Log          1M     0.2      0M    205M     0.1   .057M     0.0     N/A
    Temp File           23M     0.0   .006M     23M     0.0   .006M     0.0     1.7
    TOTAL:             8.6G    29.7  2.464M    919M    28.6   .257M     0.3     0.9
                              ------------------------------------------------------
    --//实际上对方的软件简直是变态,关于这个问题的描述在链接如下,不再分析展开贴出:
    --//http://blog.itpub.net/267265/viewspace-2222146/=>[20181129]大量的control file sequential read.txt.

    3.修改参数后的观察:
    --//检查alert发现,是Wed Nov 27 09:33:31 2019修改参数
    Wed Nov 27 09:33:31 2019
    ALTER SYSTEM SET log_archive_dest_2='service=xxxxx lgwr async reopen=15 max_failure=10 net_timeout=30 optional noaffirm db_unique_name=xxxxx' SCOPE=BOTH;

    xxxxxx> select sysdate from dual ;
    SYSDATE
    -------------------
    2019-11-27 10:08:20

    xxxxxx> @ tpt/ash/ash_wait_chains event2 "event='log file sync'" trunc(sysdate)+9/24+33/1440  sysdate
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -----------------------------------------------------------------------------------------------
      57%          26          0 -> log file sync  -> LGWR-LNS wait on channel
      33%          15          0 -> log file sync
       9%           4          0 -> log file sync  -> log file parallel write
       2%           1          0 -> log file sync  -> ON CPU

    xxxxxx> @ tpt/ash/ash_wait_chains event2 "event='log file sync'" trunc(sysdate)+9/24+34/1440  sysdate
    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -----------------------------------------------------------------------------------------------
      74%          14          0 -> log file sync
      21%           4          0 -> log file sync  -> log file parallel write
       5%           1          0 -> log file sync  -> ON CPU

    --//你可以对比看出取值范围9:33换成9:34,多了1分钟由LGWR-LNS wait on channel引起的log file sync占26秒,而9:34后的查询完全看不到这个情况.

    xxxxxx> @ tpt/ash/ash_wait_chains program2||event2 1=1 trunc(sysdate)+9/24+34/1440 sysdate

    -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com )
    %This     SECONDS        AAS WAIT_CHAIN
    ------ ---------- ---------- -----------------------------------------------------------------------------------------------
      36%         547         .2 -> (zzzzzz.EXE) ON CPU
      13%         204         .1 -> (wnwp.exe) ON CPU
       9%         145         .1 -> (zzzzzz.EXE) ON CPU
       7%         107          0 -> (NSAn) ON CPU
       5%          80          0 -> (CAPAA-PIPE) ON CPU
       3%          40          0 -> (httpd.exe) ON CPU
       2%          31          0 -> (sqlplus) ON CPU
       2%          31          0 -> (DIAn) ON CPU
       2%          24          0 -> (oracle) ON CPU
       1%          22          0 -> (LGWR) ON CPU
       1%          22          0 -> (PSPn) ON CPU
       1%          21          0 -> (LGWR) log file parallel write
       1%          16          0 -> (NSAn) LNS wait on SENDREQ
       1%          15          0 -> (sqlplus) control file sequential read
       1%          15          0 -> (LMSn) ON CPU
       1%          13          0 -> (zzzzzz.EXE) db file sequential read
       1%          12          0 -> (zzzzzz.EXE) gc cr block 2-way
       1%          12          0 -> (routine.exe) ON CPU
       1%          12          0 -> (PlSqlDev.exe) ON CPU
       1%          10          0 -> (Toad.exe) ON CPU
       1%          10          0 -> (DBWn) db file parallel write
       0%           7          0 -> (DBWn) ON CPU
       0%           7          0 -> (wnwp.exe) log file sync
       0%           6          0 -> (zzzzzz.EXE) gc current block 2-way
       0%           6          0 -> (zzzzzz.EXE) log file sync
       0%           6          0 -> (zzzzzz.EXE) direct path read
       0%           5          0 -> (zzzzzz.EXE) log file sync
       0%           5          0 -> (LMON) ON CPU
       0%           4          0 -> (wnwp.exe) log file sync  -> (LGWR) log file parallel write
       0%           4          0 -> (MMON) ON CPU
    30 rows selected.

  • 相关阅读:
    网站常见的报错
    数据库的备份与还原
    纵向tab标签切换效果
    CSS3动画 transition 过渡,宽度由中间向两边发生过渡
    移入切换图片路径;移出再切回去
    动态修改手机号
    json基本介绍及使用
    py 封装 操作json 的方法
    通过htmltestrunner模块美化报告
    记:运行测试case遇到的问题!
  • 原文地址:https://www.cnblogs.com/lfree/p/11940259.html
Copyright © 2020-2023  润新知