• debug实战:Unmanaged High Memory非托管高内存


    最近又监控到一个高内存的问题,周五下班把系统打开,周末2天没关,周一来看已经涨到5.2G,这次与以往不同,不是.net的内存泄漏,而是非托管引起的。

    1. 抓dump,确定高内存的类型

    //dump有5.2G,.net的堆只有191M,可以判断是unmanaged high memory
    0:000> !eeheap -gc
    Number of GC Heaps: 1
    generation 0 starts at 0x0000000159852b88
    generation 1 starts at 0x00000001597d26a0
    generation 2 starts at 0x0000000003231000
    ephemeral segment allocation context: none
     segment     begin allocated  size
    0000000003230000  0000000003231000  000000000dad5f38  0xa8a4f38(176836408)
    00000001596b0000  00000001596b1000  0000000159db48f8  0x7038f8(7354616)
    Large object heap starts at 0x0000000013231000
     segment     begin allocated  size
    0000000013230000  0000000013231000  0000000013906b30  0x6d5b30(7166768)
    Total Size:              Size: 0xb67e360 (191357792) bytes.
    ------------------------------
    GC Heap Size:            Size: 0xb67e360 (191357792) bytes.
    

    2. 用windbg找原因

    于是不能用sos ext里的命令了,用CodeProject上的这篇文章里的方法试了一下,如下:

    //这个命令能看到所有的内存消耗,确实分配了5G多
    0:000> !address -summary	                                     
    Mapping file section regions...
    Mapping module regions...
    Mapping PEB regions...
    Mapping TEB and stack regions...
    *** ERROR: Symbol file could not be found.  Defaulted to export symbols for DebussyReverie.dll - 
    Mapping heap regions...
    Mapping page heap regions...
    Mapping other regions...
    Mapping stack trace database regions...
    Mapping activation context regions...
    --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    Free                                    412      7fe`897c2000 (   7.994 Tb)           99.93%
    Heap                                    628        1`24c81000 (   4.575 Gb)  78.18%    0.06%
    <unknown>                              1217        0`39381000 ( 915.504 Mb)  15.28%    0.01%
    Image                                  2120        0`14561000 ( 325.379 Mb)   5.43%    0.00%
    Stack                                   200        0`04080000 (  64.500 Mb)   1.08%    0.00%
    Other                                    13        0`001c6000 (   1.773 Mb)   0.03%    0.00%
    TEB                                      66        0`00084000 ( 528.000 kb)   0.01%    0.00%
    PEB                                       1        0`00001000 (   4.000 kb)   0.00%    0.00%
    --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_PRIVATE                            1509        1`59760000 (   5.398 Gb)  92.24%    0.07%
    MEM_IMAGE                              2683        0`184d0000 ( 388.813 Mb)   6.49%    0.00%
    MEM_MAPPED                               53        0`04bfe000 (  75.992 Mb)   1.27%    0.00%
    --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_FREE                                412      7fe`897c2000 (   7.994 Tb)           99.93%
    MEM_COMMIT                             3267        1`4d29f000 (   5.206 Gb)  88.96%    0.06%
    MEM_RESERVE                             978        0`2958f000 ( 661.559 Mb)  11.04%    0.01%
    
    //再查看非托管堆,能看到有1个堆占了4.5G
    0:000> !heap -s
    LFH Key                   : 0x000000ff3fbc76ee
    Termination on corruption : ENABLED
              Heap     Flags   Reserv  Commit  Virt   Free  List   UCR  Virt  Lock  Fast 
                                (k)     (k)    (k)     (k) length      blocks cont. heap 
    -------------------------------------------------------------------------------------
    Virtual block: 0000000002ca0000 - 0000000002ca0000 (size 0000000000000000)
    ...
    000000001fc30000 00001002 4597824 4568704 4597824    568   296   288    0      1   LFH
    ...
        External fragmentation  23 % (59 free blocks)
    -------------------------------------------------------------------------------------
    
    //查看有问题的堆,大小为27d1的某个内存块共有6cc72个,27d1*6cc72=10eb24712(4.5G)
    0:000> !heap -stat -h 000000001fc30000 
     heap @ 000000001fc30000
    group-by: TOTSIZE max-display: 20
        size     #blocks     total     ( %) (percent of total busy bytes)
        27d1 6cc72 - 10eb24712  (99.95)
        128 18d5 - 1cb648  (0.04)
        2740 11 - 29b40  (0.00)
        3e80 1 - 3e80  (0.00)
    
    //用size=27d1过滤,查看每个对象的指针UserPtr
    0:000> !heap -flt s 27d1
        _HEAP @ 90000
    	...
        _HEAP @ 1fc30000
                  HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state
            0000000025217420 0281 0000  [00]   0000000025217430    027d1 - (busy)
            0000000025219c30 0281 0281  [00]   0000000025219c40    027d1 - (busy)
    
    //随便找个对象地址,看被哪里引用了,悲剧的是,不管找哪个,都没有堆栈信息stack trace!
    0:000> !heap -p -a 0000000025217430    
        address 0000000025217430 found in
        _HEAP @ 1fc30000
                  HEAP_ENTRY Size Prev Flags            UserPtr UserSize - state
            0000000025217420 0281 0000  [00]   0000000025217430    027d1 - (busy)
    

    3. 用DebugDiag找原因

    上面的路目前看起来是走不大通了,于是换Debug大神-美女程序猿-Tess的这篇博文里的方法,改用DebugDiag试试。

    1. 下载DebugDiag1.2,这是目前最新的版本,已经能支持win7和x64位的系统了。根据你的os选择x86/x64的msi版本并安装。
    2. 重新启动你的程序,因为这个工具是在跟踪tracking内存分配的过程,所以需要在high memory出现之前就开始监控。
    3. 打开DebugDiag,选择Add Rule、Native (non-.NET) Memory and Handle,选择要分析的进程,Rule->Status会显示目前处于tracking的状态。
    4. 执行一段时间、1h 或 1天,转到tab页Processes里右键进程、Create Full Userdump。
    5. 然后转到Advanced Analysis里选中"Memory Pressure Analyzers"脚本,点击Add Data Files把生成的dump加进去。
    6. 点击"Start Analysis"就开始分析了,要耐心一点,视dump大小一般需要>10min。

    因为要重新启动,下面的dump是我重新抓的,跑了一晚上,大概有1.7G,分析的结果显示是一个DebussyReverie.dll的模块在不停的分配内存。

    DebussyReverie.dll is responsible for 1.15 GBytes worth of outstanding allocations. The following are the top 2 memory consuming functions:	
    
    DebussyReverie!GetLspGuid+79e: 1.07 GBytes worth of outstanding allocations.	
    DebussyReverie!GetLspGuid+f31df: 83.04 MBytes worth of outstanding allocations.
    
    Image Name: C:Program FilesLeagSoftUniAccess AgentDebussyReverie.dll 
    Function   DebussyReverie!GetLspGuid+79e 
    Allocation type   Heap allocation(s) 
    Heap handle   0x00000000`00000000 
    Allocation Count   114716 allocation(s) 
    Allocation Size   1.07 GBytes 
    Leak Probability   81% 
    
    //堆栈如下,可惜没能显示从我的代码到问题代码之间的调用关系
    Function   Source   Destination 
    LeakTrack+13277       
    DebussyReverie!GetLspGuid+79e       
    DebussyReverie!GetLspGuid+647       
    DebussyReverie!GetLspGuid+ea3       
    DebussyReverie!WSPStartup+431       
    ws2_32!WahOpenApcHelper+20a       
    ws2_32!_WSAFDIsSet+e02       
    ws2_32!WSCInstallProviderAndChains64_32+1971       
    ws2_32!WSCInstallProviderAndChains64_32+1aa0       
    ws2_32!WSASendTo+21fc       
    ws2_32!getsockname+4a       
    DebussyReverie!PublicService+74b       
    DebussyReverie!GetLspGuid+a41dd       
    DebussyReverie!PublicService+2c70       
    ws2_32!WSApSetPostRoutine+298       
    ws2_32!WSAConnect+af       
    System_ni+32f3f2       
    ...     
    0x7FE8925FEA7 
    

    看起来,问题来自公司安装的网络监控软件LeagSoft里的DebussyReverie.dll。大约是下班后服务器断开了,项目里Apache.NMS.ActiveMQ连接出错时调用多线程反复重连,然后被监控软件拦截到后出的错。应该是LeagSoft里的一个Bug,如果能和他们交流一下,应该能进一步确定问题所在,甚至修复这个问题。

  • 相关阅读:
    IE10 下兼容性问题
    前端面试题十九
    前端面试题十八
    前端面试题十七
    前端面试题十六
    前端面试题十五
    前端面试题十四
    前端面试题十三(兼容)
    前端面试题十二
    前端面试题十一
  • 原文地址:https://www.cnblogs.com/AlexanderYao/p/5337157.html
Copyright © 2020-2023  润新知