• 记一次 .NET 某招聘网后端服务 内存暴涨分析


    一:背景

    1. 讲故事

    前段时间有位朋友wx找到我,说他的程序存在内存阶段性暴涨,寻求如何解决,和朋友沟通下来,他的内存平时大概是5G 左右,在某些时点附近会暴涨到 10G+, 画个图大概就是这样。

    所以接下来就是想办法给他找到那莫名奇妙的 5-6G 是个啥,上 windbg 说话。

    二:Windbg 分析

    1. 判断托管还是非托管

    从描述上看大概率是托管层面的问题,但为了文章的完整性,我们还是用 !address -summary!eeheap -gc 来看一下。

    
    0:000> !address -summary
    
    --- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    Free                                   1164      7f5`58f12000 (   7.958 TB)           99.48%
    <unknown>                              6924        a`6de84000 (  41.717 GB)  97.90%    0.51%
    Stack                                  1123        0`16340000 ( 355.250 MB)   0.81%    0.00%
    Image                                  4063        0`1607d000 ( 352.488 MB)   0.81%    0.00%
    Heap                                     71        0`0c9ea000 ( 201.914 MB)   0.46%    0.00%
    TEB                                     374        0`002ec000 (   2.922 MB)   0.01%    0.00%
    Other                                    13        0`001c6000 (   1.773 MB)   0.00%    0.00%
    PEB                                       1        0`00001000 (   4.000 kB)   0.00%    0.00%
    
    --- Type Summary (for busy) ------ RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_PRIVATE                            5423        a`87200000 (  42.111 GB)  98.83%    0.51%
    MEM_IMAGE                              7033        0`1e5d6000 ( 485.836 MB)   1.11%    0.01%
    MEM_MAPPED                              113        0`01908000 (  25.031 MB)   0.06%    0.00%
    
    --- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
    MEM_FREE                               1164      7f5`58f12000 (   7.958 TB)           99.48%
    MEM_RESERVE                            4165        8`1b873000 (  32.430 GB)  76.11%    0.40%
    MEM_COMMIT                             8404        2`8b86b000 (  10.180 GB)  23.89%    0.12%
    
    
    0:000> !eeheap -gc
    Number of GC Heaps: 32
    ------------------------------
    Heap 0 (00000000004106d0)
    generation 0 starts at 0x0000000082eb0e58
    generation 1 starts at 0x0000000082d79b20
    generation 2 starts at 0x000000007fff1000
    ephemeral segment allocation context: none
             segment             begin         allocated              size
    000000007fff0000  000000007fff1000  0000000083f80128  0x3f8f128(66646312)
    Large object heap starts at 0x000000087fff1000
             segment             begin         allocated              size
    000000087fff0000  000000087fff1000  0000000883fe4190  0x3ff3190(67056016)
    0000000927ff0000  0000000927ff1000  000000092bfe2430  0x3ff1430(67048496)
    0000000a81c50000  0000000a81c51000  0000000a8221c858  0x5cb858(6076504)
    Heap Size:               Size: 0xc53ef40 (206827328) bytes.
    ------------------------------
    ...
    Heap 31 (0000000019c84130)
    generation 0 starts at 0x0000000844fc5170
    generation 1 starts at 0x0000000844f851f8
    generation 2 starts at 0x000000083fff1000
    ephemeral segment allocation context: none
             segment             begin         allocated              size
    000000083fff0000  000000083fff1000  0000000845171ca0  0x5180ca0(85462176)
    Large object heap starts at 0x00000008fbff1000
             segment             begin         allocated              size
    00000008fbff0000  00000008fbff1000  00000008fffe2290  0x3ff1290(67048080)
    000000094bff0000  000000094bff1000  000000094ea2ebb8  0x2a3dbb8(44293048)
    000000096bff0000  000000096bff1000  000000096dbdec00  0x1bedc00(29285376)
    Heap Size:               Size: 0xd79d6e8 (226088680) bytes.
    ------------------------------
    GC Heap Size:            Size: 0x1f1986a88 (8348265096) bytes.
    
    

    从卦中得知,10G的内存,托管堆吃掉了 8.3G,很明显托管层问题,知道大方向后,接下来就可以到托管堆看一看,根据过往经验程序肯定是生成了大量的类对象所致,上命令 !dumpheap -stat

    
    0:000> !dumpheap -stat
    Statistics:
                  MT    Count    TotalSize Class Name
    ...
    000007fe9ddd5fc0   341280     30032640 System.ServiceModel.Description.MessagePartDescription
    000007fe9c4865a0   866349     41584752 System.Xml.XmlDictionaryString
    000007fe9defb098   937801     45014448 System.Xml.XmlDictionaryString
    000007fe9c66bd28   105052     45086880 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
    000007fe9e0f4d20   113299     49050864 System.Collections.Generic.Dictionary`2+Entry[[System.String, mscorlib],[System.Xml.XmlDictionaryString, System.Runtime.Serialization]][]
    00000000003c9190    44573    618414438      Free
    000007fef8f6c168   428410   1209974642 System.Char[]
    000007fef8f4f1b8  2849758   1246912848 System.Object[]
    000007fef8f6f058   531963   1670620873 System.Byte[]
    000007fef8f6aee0  2368431   2382587716 System.String
    
    

    真是皂滑弄人,并没有命中过往经验,可以看出占用最大的都是些 Byte,String,Char,Object 基础类型,其实这些基础类型排查起来很难搞,要么不断的用 -min, -max 去筛选,要么就写一个脚本对它进行分组排序,蹩脚脚本如下:

    
    "use strict";
    
    /*
       按 mt 对托管堆类型的size进行分组
    */
    
    let platform = 64
    let mtlist = ["000007fef8f4f1b8"];
    let maxlimit = 100;
    
    function initializeScript() { return [new host.apiVersionSupport(1, 7)]; }
    function log(str) { host.diagnostics.debugLog(str + "
    "); }
    function exec(str) { log("
    " + str); return host.namespace.Debugger.Utility.Control.ExecuteCommand(str); }
    function invokeScript() { for (var mt of mtlist) { groupby_mtsize_inheap(mt); } }
    
    //对某个类型按照size 进行分组
    function groupby_mtsize_inheap(mt) {
        var size_group = {};
        var commandText = "!dumpheap -mt " + mt;
        var output = exec(commandText);
        for (var line of output) {
            if (line == "" || line.indexOf("Address") > -1) continue;
            if (line.indexOf("Statistics") > -1) break;
            var size = parseInt(line.substring(Math.ceil(platform / 2) + 1).trim());
    
            if (!size_group[size]) size_group[size] = 0;
    
            size_group[size]++;
        }
        show_top10_format(mt, size_group);
    }
    
    function show_top10_format(mt, size_group) {
        var maparr = [];
    
        //转数组
        for (var size in size_group) {
            maparr.push({ "size": size, "count": size_group[size], "totalsize": (size * size_group[size]) });
        }
    
        maparr.sort(function (a, b) { return b.totalsize - a.totalsize });
    
        var topTotalSize = 0;
    
        //按size输出
        for (var i = 0; i < Math.min(maparr.length, maxlimit); i++) {
            var size = maparr[i].size;
            var count = maparr[i].count;
            var totalsize = Math.round(maparr[i].totalsize / 1024 / 1024, 2);
    
            topTotalSize += totalsize
    
            log("size=" + size + ",count=" + count + ",totalsize=" + totalsize + "M");
        }
    
        log("Total:" + topTotalSize + "M");
    
        //show max
        if (maparr.length > 0) {
            var size = maparr[0].size;
            var totalsize = Math.round(maparr[0].totalsize / 1024 / 1024, 2) + "M";
            var output = exec("!dumpheap -mt " + mt + " -min 0n" + size + " -max 0n" + size + " -short").Take(maxlimit);
            for (var line of output) {
                log(line);
            }
        }
    }
    
    
    

    接下来把 string 的方法表地址传下去看看排序结果,简化输出如下:

    
    !dumpheap -mt 000007fef8f6aee0
    size=29285946,count=2,totalsize=56M
    size=29285540,count=2,totalsize=56M
    size=29285502,count=2,totalsize=56M
    size=29285348,count=2,totalsize=56M
    size=27455186,count=2,totalsize=52M
    size=31116504,count=1,totalsize=30M
    size=31116490,count=1,totalsize=30M
    size=31116306,count=1,totalsize=30M
    size=31115934,count=1,totalsize=30M
    size=31115920,count=1,totalsize=30M
    size=31115718,count=1,totalsize=30M
    size=29286342,count=1,totalsize=28M
    size=29285898,count=1,totalsize=28M
    ...
    Total:1198M
    
    

    可以看到,有不少大 size 的 string,那这些string到底是个啥,这里我随便抽几个导出到txt看看。

    
    0:000> !dumpheap -mt 000007fef8f6aee0 -min 0n31116490 -max 0n31116490 -short 
    0000000a61c51000
    0:000> !do 0000000a61c51000 
    Name:        System.String
    MethodTable: 000007fef8f6aee0
    EEClass:     000007fef88d3720
    Size:        31116490(0x1daccca) bytes
    File:        C:WindowsMicrosoft.NetassemblyGAC_64mscorlibv4.0_4.0.0.0__b77a5c561934e089mscorlib.dll
    String:      <String is invalid or too large to print>
    
    Fields:
                  MT    Field   Offset                 Type VT     Attr            Value Name
    000007fef8f6dc90  40000aa        8         System.Int32  1 instance         15558232 m_stringLength
    000007fef8f6c1c8  40000ab        c          System.Char  1 instance               50 m_firstChar
    000007fef8f6aee0  40000ac       18        System.String  0   shared           static Empty
                                     >> Domain:Value  00000000003fb620:NotInit  000000001ca30bd0:NotInit  000000001f7b21a0:NotInit  000000001f8940c0:NotInit  0000000027dc46b0:NotInit  00000000281bd720:NotInit  00000000282b7ee0:NotInit  <<
    
    0:000> .writemem D:dumpsxxxxstring.txt 0000000a61c51000 L?0x1daccca
    Writing 1daccca bytes..........
    
    

    从内容看其实就是 pdf 的 base64 编码,以同样的方式调研 char[]byte[] 类型,发现大多也都是 pdf,猜测程序在处理 pdf 的过程中,进行了 byte[],char[],string 之间的切换,所以这些对象理论上大多属于无根对象,其实通过 !heapstat -iu 也能看到那大约 5.5G 的无根对象正等待GC回收。

    
    0:000> !heapstat -iu										
    Heap             Gen0         Gen1         Gen2          LOH										
    Heap0        17625808      1274680     47745824    140181016										
    ...									
    Total       357486256     28100616   2229673376   5733004848										
    										
    Free space:                                                 Percentage										
    Heap0         3962240           24     11211224       298616SOH: 22% LOH:  0%										
    Heap1         5625856          144      9857168       302152SOH: 27% LOH:  0%										
    ...									
    Heap31        1448576           24     19957312       218024SOH: 25% LOH:  0%										
    Total       181492784         1136    431825856      5183128										
    										
    Unrooted objects:                                           Percentage										
    Heap0        12163928       243584        42872    137153536SOH: 18% LOH: 97%										
    ...									
    Heap31         236832       239272      1435840    139770656SOH:  2% LOH: 99%										
    Total       164954952      7948448     29066480   5530423784										
    
    

    三:总结

    本次内存阶段性暴涨的事故,主要还是程序接收了上游过多的 pdf文件,毕竟这些都是大对象,还进行了 char[] ,string,byte[] 的切换,造成短时间内过大的内存占用。

    最后就是我个人的解决建议:

    1. 针对大量的pdf,能否借用第三方的 oss 软件来规避一些不必要的内存占用。

    2. 清洗服务是否可以做些限流或者使用服务均摊的方式。

    后来听朋友说,他做了筛选过滤以及一些业务流程优化解决了这个问题,我想现实中肯定有很多朋友遇到过这类问题,欢迎大家留言补充您的解决方案。

    图片名称
  • 相关阅读:
    docker 容器启动初始化,centos镜像启动并执行
    odoo 分布式session 失效解决方案
    文件分布式存储 minio docker
    odoo reports 报表打印pdf初探
    odoo 分布式快速更新
    linux Warning: Stopping docker.service, but it can still be activated by:
    linux 查看80端口的连接数
    css flex 涨姿势了
    odoo 后台打印日志修改
    iOS 导航栏消失
  • 原文地址:https://www.cnblogs.com/huangxincheng/p/15409807.html
Copyright © 2020-2023  润新知