• dentry的引用计数不对导致的crash


    [17528853.189372] python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
    [17528853.189377] python cpuset=0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c mems_allowed=0-1
    [17528853.189381] CPU: 27 PID: 396391 Comm: python Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
    [17528853.189383] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
    [17528853.189385] Call Trace:
    [17528853.189396] [<ffffffff9eb61dc1>] dump_stack+0x19/0x1b
    [17528853.189399] [<ffffffff9eb5c7ea>] dump_header+0x90/0x229
    [17528853.189405] [<ffffffff9e5b9dc6>] ? find_lock_task_mm+0x56/0xc0
    [17528853.189412] [<ffffffff9e631118>] ? try_get_mem_cgroup_from_mm+0x28/0x60
    [17528853.189414] [<ffffffff9e5ba274>] oom_kill_process+0x254/0x3d0
    [17528853.189417] [<ffffffff9e634f16>] mem_cgroup_oom_synchronize+0x546/0x570
    [17528853.189419] [<ffffffff9e634390>] ? mem_cgroup_charge_common+0xc0/0xc0
    [17528853.189422] [<ffffffff9e5bab04>] pagefault_out_of_memory+0x14/0x90
    [17528853.189424] [<ffffffff9eb5acf2>] mm_fault_error+0x6a/0x157
    [17528853.189429] [<ffffffff9eb6f7a8>] __do_page_fault+0x3c8/0x500
    [17528853.189431] [<ffffffff9eb6f915>] do_page_fault+0x35/0x90
    [17528853.189434] [<ffffffff9eb6b758>] page_fault+0x28/0x30
    [17528853.189438] Task in /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409 killed as a result of limit of /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df
    [17528853.189441] memory: usage 5242880kB, limit 5242880kB, failcnt 23453
    [17528853.189442] memory+swap: usage 5242880kB, limit 9007199254740988kB, failcnt 0
    [17528853.189444] kmem: usage 5180712kB, limit 9007199254740988kB, failcnt 0
    [17528853.189445] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df: cache:0KB rss:0KB rs
    s_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.189458] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409: cache:0KB rss:48KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:48KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.189470] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c: cache:0KB rss:62120KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:62068KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.189482] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
    [17528853.189702] [391376] 0 391376 253 1 4 0 -998 pause
    [17528853.189716] [396268] 0 396268 1090 90 8 0 -998 dumb-init
    [17528853.189731] [396385] 0 396385 5358 408 16 0 -998 run.sh
    [17528853.189733] [396391] 0 396391 583729 28149 212 0 -998 python
    [17528853.189740] Memory cgroup out of memory: Kill process 391376 (pause) score 0 or sacrifice child
    [17528853.190878] Killed process 391376 (pause) total-vm:1012kB, anon-rss:4kB, file-rss:0kB, shmem-rss:0kB
    [17528853.207058] python invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=-998
    [17528853.207062] python cpuset=0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c mems_allowed=0-1
    [17528853.207066] CPU: 27 PID: 396391 Comm: python Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
    [17528853.207067] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
    [17528853.207069] Call Trace:
    [17528853.207081] [<ffffffff9eb61dc1>] dump_stack+0x19/0x1b
    [17528853.207086] [<ffffffff9eb5c7ea>] dump_header+0x90/0x229
    [17528853.207091] [<ffffffff9e5b9dc6>] ? find_lock_task_mm+0x56/0xc0
    [17528853.207097] [<ffffffff9e631118>] ? try_get_mem_cgroup_from_mm+0x28/0x60
    [17528853.207100] [<ffffffff9e5ba274>] oom_kill_process+0x254/0x3d0
    [17528853.207102] [<ffffffff9e634f16>] mem_cgroup_oom_synchronize+0x546/0x570
    [17528853.207105] [<ffffffff9e634390>] ? mem_cgroup_charge_common+0xc0/0xc0
    [17528853.207107] [<ffffffff9e5bab04>] pagefault_out_of_memory+0x14/0x90
    [17528853.207109] [<ffffffff9eb5acf2>] mm_fault_error+0x6a/0x157
    [17528853.207114] [<ffffffff9eb6f7a8>] __do_page_fault+0x3c8/0x500
    [17528853.207116] [<ffffffff9eb6f915>] do_page_fault+0x35/0x90
    [17528853.207119] [<ffffffff9eb6b758>] page_fault+0x28/0x30
    [17528853.207122] Task in /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c killed as a result of limit of /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df
    [17528853.207125] memory: usage 5242880kB, limit 5242880kB, failcnt 23746
    [17528853.207127] memory+swap: usage 5242880kB, limit 9007199254740988kB, failcnt 0
    [17528853.207128] kmem: usage 5180712kB, limit 9007199254740988kB, failcnt 0
    [17528853.207130] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.207147] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/b4168f4f943fd2df856ed1cd132a4441d4cf4cb429b620800b2dddb9e6873409: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.207162] Memory cgroup stats for /kubepods/pode7c348fc-374d-11ea-94d5-6c92bfcc52df/0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c: cache:0KB rss:62168KB rss_huge:0KB---内存多占 mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:62156KB inactive_file:0KB active_file:0KB unevictable:0KB
    [17528853.207173] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
    [17528853.207328] [396268] 0 396268 1090 90 8 0 -998 dumb-init
    [17528853.207342] [396385] 0 396385 5358 408 16 0 -998 run.sh
    [17528853.207345] [396391] 0 396391 583729 28212 212 0 -998 python
    [17528853.207352] Memory cgroup out of memory: Kill process 396467 (python) score 0 or sacrifice child
    [17528853.208557] Killed process 396391 (python) total-vm:2334916kB, anon-rss:61720kB, file-rss:51128kB, shmem-rss:0kB
    [17528853.217780] ___slab_alloc: 24 callbacks suppressed
    [17528853.217783] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.217785] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.217787] node 0: slabs: 1, objs: 49, free: 0
    [17528853.217789] node 1: slabs: 0, objs: 0, free: 0
    [17528853.251423] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.251426] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.251428] node 0: slabs: 7, objs: 121, free: 12
    [17528853.251430] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252295] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252298] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252300] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252302] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252304] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252307] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252309] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252310] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252600] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252603] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252605] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252608] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252669] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252672] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252676] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252680] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252682] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252684] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252686] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252688] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252689] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252694] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252697] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252703] node 1: slabs: 2, objs: 24, free: 0
    [17528853.252728] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.252731] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.252733] node 0: slabs: 7, objs: 121, free: 12
    [17528853.252735] node 1: slabs: 2, objs: 24, free: 0
    [17528853.253048] SLUB: Unable to allocate memory on node -1 (gfp=0x80d0)
    [17528853.253051] cache: taskstats(13324:0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c), object size: 328, buffer size: 328, default order: 2, min order: 0
    [17528853.253053] node 0: slabs: 7, objs: 121, free: 12
    [17528853.253055] node 1: slabs: 2, objs: 24, free: 0
    [17528853.343423] device vvport87 left promiscuous mode
    [17528853.361205] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
    [17528853.362884] IP: [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
    [17528853.364358] PGD 0
    [17528853.365789] Oops: 0000 [#1] SMP
    [17528853.367196] Modules linked in: fuse tcp_diag udp_diag inet_diag 8021q garp mrp nvidia_uvm(POE) veth ip6table_nat ip6_tables openvswitch nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 xt_set ip_set_bitmap_port ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set dummy xt_comment xt_mark ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat overlay(T) nf_conntrack_ipv4 nf_defrag_ipv4 ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nf_conntrack br_netfilter bridge stp llc bonding sunrpc dm_mirror dm_region_hash dm_log dm_mod iTCO_wdt iTCO_vendor_support xfs libcrc32c skx_edac coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd
    [17528853.376009] pcspkr sg joydev mei_me i2c_i801 mei lpc_ich ipmi_si acpi_power_meter ip_tables ext4 mbcache jbd2 nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) sd_mod crc_t10dif crct10dif_generic ast i2c_algo_bit drm_kms_helper crct10dif_pclmul crct10dif_common crc32c_intel ttm syscopyarea sysfillrect sysimgblt ixgbe fb_sys_fops drm ahci libahci megaraid_sas mdio ptp pps_core libata dca drm_panel_orientation_quirks wmi ipmi_devintf ipmi_msghandler nfit libnvdimm
    [17528853.382015] CPU: 29 PID: 396548 Comm: nvidia-containe Kdump: loaded Tainted: P OE ------------ T 3.10.0-957.el7.x86_64 #1
    [17528853.384996] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 01/05/2019
    [17528853.386496] task: ffff8fc225aee180 ti: ffff8fbf06cec000 task.ti: ffff8fbf06cec000
    [17528853.387985] RIP: 0010:[<ffffffff9e65deaa>] [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
    [17528853.389474] RSP: 0018:ffff8fbf06cefda0 EFLAGS: 00010202
    [17528853.390936] RAX: 0000000000000001 RBX: ffff8fd5b42cf500 RCX: 0000002600000000
    [17528853.392398] RDX: 00000000ffffffd9 RSI: ffff8fd5b42cf500 RDI: 0000000000000000
    [17528853.393835] RBP: ffff8fbf06cefda8 R08: 000000000001f0c0 R09: ffffffff9e658bae
    [17528853.395265] R10: ffff8fd17fa5f0c0 R11: fffff46641c1bb80 R12: ffff8ff11d065a30
    [17528853.396681] R13: ffff8fd5b42cf500 R14: 0000000000000000 R15: ffff8fc069b58600
    [17528853.398080] FS: 00007f7a83f9d740(0000) GS:ffff8fd17fa40000(0000) knlGS:0000000000000000
    [17528853.399472] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [17528853.400851] CR2: 0000000000000048 CR3: 0000002d61ff2000 CR4: 00000000007607e0
    [17528853.402227] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [17528853.403587] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [17528853.404914] PKRU: 55555554
    [17528853.406224] Call Trace:
    [17528853.407512] [<ffffffff9e66b1ec>] simple_rmdir+0x2c/0x50
    [17528853.408792] [<ffffffff9e525888>] cgroup_destroy_locked+0x238/0x370
    [17528853.410057] [<ffffffff9e5259e2>] cgroup_rmdir+0x22/0x40
    [17528853.411303] [<ffffffff9e64dc2c>] vfs_rmdir+0xdc/0x150
    [17528853.412528] [<ffffffff9e6530d1>] do_rmdir+0x1f1/0x220
    [17528853.413732] [<ffffffff9e647034>] ? SYSC_newfstatat+0x24/0x60
    [17528853.414916] [<ffffffff9e654335>] SyS_unlinkat+0x25/0x40
    [17528853.416087] [<ffffffff9eb74ddb>] system_call_fastpath+0x22/0x27
    [17528853.417239] Code: 00 c7 47 48 00 00 00 00 48 8b 47 28 f0 48 ff 80 e0 03 00 00 5d c3 90 66 2e 0f 1f 84 00 00
    00 00 00 0f 1f 44 00 00 55 48 89 e5 53 <8b> 47 48 48 89 fb 85 c0 74 19 83 e8 01 85 c0 89 43 48 75 0c 48
    [17528853.419656] RIP [<ffffffff9e65deaa>] drop_nlink+0xa/0x50
    [17528853.420811] RSP <ffff8fbf06cefda0>
    [17528853.421940] CR2: 0000000000000048

    对应的dentry->d_inode 为NULL,调用链分析如下:

    cgroup_destroy_locked-->cgroup_d_remove_dir-->remove_dir-->simple_rmdir-->drop_nlink(oops)
    在执行 remove_dir--->simple_rmdir之前,会先 remove_dir-->d_delete ,

    void d_delete(struct dentry * dentry)
    {
        struct inode *inode;
        int isdir = 0;
        /*
         * Are we the only user?
         */
    again:
        spin_lock(&dentry->d_lock);
        inode = dentry->d_inode;
        isdir = S_ISDIR(inode->i_mode);
        if (dentry->d_lockref.count == 1) {//if we enter this 
            if (!spin_trylock(&inode->i_lock)) {
                spin_unlock(&dentry->d_lock);
                cpu_relax();
                goto again;
            }
            dentry->d_flags &= ~DCACHE_CANT_MOUNT;
            dentry_unlink_inode(dentry);//dentry.d_inode will be null
            fsnotify_nameremove(dentry, isdir);
            return;
        }
    
        if (!d_unhashed(dentry))
            __d_drop(dentry);
    
        spin_unlock(&dentry->d_lock);
    
        fsnotify_nameremove(dentry, isdir);
    }

    所以错误的原因应该是  dentry->d_lockref.count 的计数错误。

    crash> dentry.d_name,d_inode,d_lockref ffff8fd5b42cf500
      d_name = {
        {
          {
            hash = 135773531, 
            len = 64
          }, 
          hash_len = 275013680475
        }, 
        name = 0xffff8fa3fb67b060 "0d5ec7eb04debcc3d63c2ba64870de420de3ad04089fd107eed566dda66bb60c"
      }
      d_inode = 0x0
      d_lockref = {
        {
          lock_count = 4294967296, 
          {
            lock = {
              {
                rlock = {
                  raw_lock = {
                    val = {
                      counter = 0
                    }
                  }
                }
              }
            }, 
            count = 1---------不应该为1
          }
        }
      }

     分析了对应的dentry的 d_subdirs,已经为空,说明它下面的文件和目录都已经跟它脱离关系。

    static int cgroup_destroy_locked(struct cgroup *cgrp)
    {
    .....
        dget(d);//we add dentry->d_lockref.count,it will at least 2
        cgroup_d_remove_dir(d);
        dput(d);
    ....}

     比如它的一个子dentry如下:

    crash> dentry.d_parent,d_name  ffff8fded6884e40
      d_parent = 0xffff8fd5b42cf500--------------对应的父dentry就是本文出问题的dentry
      d_name = {
        {
          {
            hash = 2600622579, 
            len = 27
          }, 
          hash_len = 118564739571
        }, 
        name = 0xffff8fded6884e78 "memory.memsw.limit_in_bytes"---这个就是在cgroup中常见的内存限制文件
      }

    在这个dentry出问题之前,我们可以看到很多这个pod下的进程因为oom被kill掉,现在担心oom会引起了一些并发的操作,引起dentry的计数不对。

  • 相关阅读:
    diary and html 文本颜色编辑,行距和其它编辑总汇
    bash coding to changeNames
    virtualbox ubuntu 网络连接 以及 连接 secureCRT
    linux 学习6 软件包安装
    linux 学习8 权限管理
    vim 使用2 转载 为了打开方便
    ubuntu
    linux 学习15 16 启动管理,备份和恢复
    linux 学习 14 日志管理
    linux 学习 13 系统管理
  • 原文地址:https://www.cnblogs.com/10087622blog/p/12257151.html
Copyright © 2020-2023  润新知