• [Oracle运维工程师手记] 如何从trace 文件,判断是否执行了并行


    [Oracle运维工程师手记系列]如何从trace 文件,判断是否执行了并行

    客户说,明明指定了并行的hint,OEM 却报说没有并行,并且提供了画面。

    客户的SQL文长这样:

    INSERT/*+ parallel(4) */ INTO TAB001_WORK SELECT/*+ FULL(USR002) */
    USR002.IM_PRO_CD, USR002.IM_NO, USR002.PS_DATE, USR002.YY_MM,

    到底如何呢,口说无凭,还是来作一次并行trace吧。

    SQL> alter session set tracefile_identifier='test1';
    SQL> alter session set events 'trace[px_messaging] disk highest';
    SQL> alter session set events 'trace[px_control] disk highest';
    SQL> alter session set events 'trace[px_scheduler] disk high';
    SQL> alter session set events 'trace[sql_compiler.*] disk medium';
    SQL> 执行 出现问题的 SQL 文
    SQL> exit;

    这样,会一次性地得到几个trace 文件。

    MBL12_ora_92961_test1.trc
    MBL11_p000_33866_test1.trc
    MBL11_p001_32473_test1.trc
    MBL11_p002_32741_test1.trc
    ......

    第一个文件: 是 QC 文件。其他的应当是它生成的 Parallel 文件。

    应当主要从QC 文件着手进行分析:

    **************************
    Automatic degree of parallelism (ADOP)
    **************************
    kkfdtParallel: parallel is possible (no statement type restrictions)
    kkfdIsAutoDopSupported:Yes, ctxoct:2, boostrap SQL?:FALSE, remote?:FALSE, stmt?:FALSE.
    Automatic degree of parallelism is enabled for this statement in hint mode.
    kkopqSetForceParallelProperties: Hint:yes
    Query: compute:no  forced:yes forceDop:4
    DDLDML : compute:no  forced:yes forceDop:4
    kkopqSetDopReason: Reason why we chose this DOP is: hint. <<<<< 此处是说明通过hint ,认识到要执行 parallel 处理。
    hint forces parallelism with dop=4

    下面,内容是这样的:

    2018-04-24 09:36:23.759351*:PX_Messaging:kxfp.c@18652:kxfpclinfo():
    load balancing disabled due to single PQ running (non-QA mode)on local inst 2 (total # inst: 2).
    (default: 0) inst target is 40
    number of active slaves on the instance: 0,
    number of active slaves but available to use: 0 <<<<< 一开始的时候,没有可用的资源

    接下来,可以看到,正尝试着去获得slave 进程:

    2018-04-24 09:36:23.760357*:PX_Messaging:kxfp.c@11588:kxfpg1srv(): trying to get slave P000 on instance 1 for q=0x80d9eac90
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11631:kxfpg1srv(): slave P000 is remote (inst=1) 2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11669:kxfpg1srv(): - acquired dp=(nil)
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11104:kxfpg1sg(): Got It. 1 so far.
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11588:kxfpg1srv(): trying to get slave P001 on instance 1 for q=0x80d9eac90
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11631:kxfpg1srv(): slave P001 is remote (inst=1) 2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11669:kxfpg1srv(): - acquired dp=(nil)
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11104:kxfpg1sg(): Got It. 2 so far.
    jStart=0 jEnd=60 jIncr=1 isGV=0 i=1 instno=2 kxfpilthno=2
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11588:kxfpg1srv(): trying to get slave P000 on instance 2 for q=0x80d9eac90
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11686:kxfpg1srv(): slave P000 is local
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11716:kxfpg1srv(): found dp=0x85a9c9c68 flg=18
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11747:kxfpg1srv(): local slave already started.. sid = 0, iid = 2
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11104:kxfpg1sg(): Got It. 3 so far.
    2018-04-24 09:36:23.760927*:PX_Messaging:kxfp.c@11588:kxfpg1srv(): trying to get slave P001 on instance 2 for q=0x80d9eac90
    2018-04-24 09:36:23.761444*:PX_Messaging:kxfp.c@11686:kxfpg1srv(): slave P001 is local
    2018-04-24 09:36:23.761444*:PX_Messaging:kxfp.c@11716:kxfpg1srv(): found dp=0x85a9c9cf8 flg=18
    2018-04-24 09:36:23.761444*:PX_Messaging:kxfp.c@11747:kxfpg1srv(): local slave already started.. sid = 1, iid = 2
    2018-04-24 09:36:23.761444*:PX_Messaging:kxfp.c@11104:kxfpg1sg(): Got It. 4 so far.
    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<  上面的信息表明开始试着去获取并行slave 进程,并渐次取得成功

    再往下看,可以看到确实获得了想要的那些个并行进程:

    2018-04-24 09:36:23.774484*:PX_Messaging:kxfp.c@11536:kxfpg1sg():
    got 4 servers (sync), errors=0x0 returning   <<<<<<<<<<<<<<<<<<<<<<<<<    从这里也可以看得出来,确实取得了4个并行进程  
    Acquired 8 slaves on 2 instances avg height=2 #set=2 qser=14923778 <<<<<<<<  估计因为有两个 set 的缘故,故此得到了8个并行进程
    P000 inst 1 spid 33866
    P001 inst 1 spid 32473
    P000 inst 2 spid 81205
    P001 inst 2 spid 88210
    P002 inst 1 spid 32741
    P003 inst 1 spid 38214
    P002 inst 2 spid 82324
    P003 inst 2 spid 88129
    2018-04-24 09:36:23.774484*:PX_Messaging:kxfp.c@10729:kxfpgsg():
    Instance(servers):
    inst=1 #slvs=4
    inst=2 #slvs=4

    接下来,可以看到 QC 正在和 各个Parallel进程 set 进行通信:

          QC enabled kgl EXPRESS bit on slave for SQL:
    ----- Current SQL Statement for this session (sql_id=cay1xmzv2mtyz) -----
    INSERT/*+ parallel(4) */ INTO TAB001_WORK SELECT/*+ FULL(USR002) */ USR002.IM_PRO_CD, USR002.IM_NO, USR002.PS_DATE, USR002.YY_MM, USR002.YEAR,

    .....

           size:  40 aligned:  40 total:9104 rem:7008 to:   4
           Sending parse to slave set 1:  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< QC 和 set 1 进行通信
             User sqllen sent from QC = 7317

    ......

    2018-04-24 09:36:23.793626*:PX_Messaging:kxfp.c@5138:kxfpqrsnd():
    Deliver to qref=0x85e94cd8 (points at 2.-1) msg=0x7bff77fd0 flg=0x1
     qref: qrser=14923778 qrseq=5 qrflg=0x1 fmh=0x2 state=00010
     msg:  mhser=14923778 mhseq=6 mhst=3 mhty=6 from=0x85e974c8
    qref flow mode now 0x1
    Receiver qref ending state=11010
    after rsnd/qsnd for qref=0x85e974c8 state=10011 slm=0 bn=1 b0=0x7bff4dfb8 b1=(nil)
    after buf swap qref=0x85e974c8 state=00000 slm=0 bn=1 b0=(nil) b1=0x7bff4dfb8
    sender qref ending state=00000
    kxfxcp1                                                        [      50/     0]
           Sending parse to nprocs:4 slave_set:2 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< QC 和 set 2 进行通信
    2018-04-24 09:36:23.793843*:PX_Messaging:kxfp.c@4131:kxfprigeb():
    Get buffer on q=0x80d9eac90 qref=0x85e9dbf0 server=1.2 flags0x0 qstat=10000
    Got buffer on qref=0x85e9dbf0 qrser=14923778 qrseq=4 mh=0x7bff59fb8 fmh=a1 qstat=10011

    ......

           QC enabled kgl EXPRESS bit on slave for SQL:
    ----- Current SQL Statement for this session (sql_id=cay1xmzv2mtyz) -----
    INSERT/*+ parallel(4) */ INTO TAB001_WORK SELECT/*+ FULL(USR002) */ USR002.IM_PRO_CD,

    ......

    61394 行目:

           QC sends top nobj#:-1 ikc:0 #parts:1048576 flg:0
    kxfxpnd                                                        [      60/     0]
           size:  40 aligned:  40 total:8984 rem:7128 to:   4
           kxfxpoeobjv:65535, kxfxpoPMax:1048576, kxfxponobj:-1
           QC sends top nobj#:-1 ikc:0 #parts:1048576 flg:0
    kxfxpnd                                                        [      60/     0]
           size:  40 aligned:  40 total:9024 rem:7088 to:   4
           kxfxpoeobjv:65535, kxfxpoPMax:1048576, kxfxponobj:-1
           QC sends top nobj#:-1 ikc:0 #parts:1048576 flg:0
    kxfxpnd                                                        [      60/     0]
           size:  40 aligned:  40 total:9064 rem:7048 to:   4
           kxfxpoeobjv:65535, kxfxpoPMax:1048576, kxfxponobj:-1
           QC sends top nobj#:-1 ikc:0 #parts:1048576 flg:0
    kxfxpnd                                                        [      60/     0]
           size:  40 aligned:  40 total:9104 rem:7008 to:   4

    到最后,可以看到QC与Slave进程的通信结束了, Slave进程被释放:

    2018-04-24 09:36:26.599729*:PX_Messaging:kxfp.c@21306:kxfpIDNDeregister():
    removing link for qc 0x80d9eac90 sess 423 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< QC 与 Slave 进程的通信结束
    removing link 0x87544b030 for qc 0x80d9eac90 on list 7

    ......

    2018-04-24 09:36:26.608002*:PX_Messaging:kxfp.c@6477:kxfpGatherSlaveStats(begin):
    q=0x80d9eac90 qser=14923778
    2018-04-24 09:36:26.608002*:PX_Messaging:kxfp.c@6572:kxfpGatherSlaveStats(end):
    2018-04-24 09:36:26.608002*:PX_Messaging:kxfp.c@3102:kxfpqsod_qc_sod(): all slaves released qser=14923778 <<<<<<<<<<<<<  所有的 Slave 进程工作结束,被回收


    从上面就可以看到确实是完成了并行的工作。

    但是,为什么 OEM 报 没有并行,而是单进程执行? 初步估计是OEM 显示错了。进一步的分析就要移交给 OEM 团队通过 OEM 观点进行分析了。

  • 相关阅读:
    Quickuse.Ioc 快速应用.依赖注入组件
    Quickuse.Utility 快速应用.基础组件
    对System.ComponentModel.DataAnnotations 的学习应用
    C# List 转 Tree 公共方法
    C# 用Redis实现的分布式锁
    使用DbTableColumnWeb项目简要
    Application_Error VS OnException 遇到的坑
    在使用Intelligencia.UrlRewriter过程中 中文乱码问题
    C# MVC 页面静态化导致的问题
    关于.NetCore与.Netframework 对于DataSet的序列化与反序列化问题的探讨.
  • 原文地址:https://www.cnblogs.com/gaojian/p/8964573.html
Copyright © 2020-2023  润新知