• 一个未完成的2.6.32-220内核踩内存crash分析记录


    遇到一个crash,log如下:
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffff81166504>] s_show+0xe4/0x330
    PGD 1158954067 PUD 12666d8067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0/host0/port-0:0/expander-0:0/port-0:0:6/end_device-0:0:6/target0:0:6/0:0:6:0/block/sdw/stat
    CPU 11
    Modules linked in: **********************
    Pid: 7739, comm: slabtop Not tainted 2.6.32-220.el6.x86_64 #1 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: 0010:[<ffffffff81166504>]  [<ffffffff81166504>] s_show+0xe4/0x330
    RSP: 0018:ffff8817fc9e1d98  EFLAGS: 00010086
    RAX: ffff880c2fc217c0 RBX: 00000000000003fb RCX: ffff880c2fc21800
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880c2fc217d0
    RBP: ffff8817fc9e1e18 R08: 0000000000000001 R09: 0000000000000001
    R10: ffffffff817a234e R11: 0000000000000246 R12: 00000000000003fb
    R13: ffffffff817a234e R14: 0000000000000400 R15: 0000000000000000
    FS:  00007feb7bb6b700(0000) GS:ffff8800283c0000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 000000125178a000 CR4: 00000000000006e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process slabtop (pid: 7739, threadinfo ffff8817fc9e0000, task ffff8817facd2100)
    Stack:
     ffff8812666d8080 ffff881158954000 ffff881700000000 ffff880c2fc21800
    <0> ffff880c2fc217c0 ffff8817f3eee740 ffff880c2fd18498 0000000000000000
    <0> 0000000000000000 ffff880c2fd10440 ffff8817fc9e1e18 ffff8817f3eee740
    Call Trace:
     [<ffffffff811a0a35>] seq_read+0xe5/0x3f0
     [<ffffffff811e35be>] proc_reg_read+0x7e/0xc0
     [<ffffffff8117ea75>] vfs_read+0xb5/0x1a0
     [<ffffffff810d68c2>] ? audit_syscall_entry+0xc2/0x2b0
     [<ffffffff8117ebb1>] sys_read+0x51/0x90
     [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
    Code: 10 48 39 fe 74 34 4c 8b 45 c8 45 8b 88 18 80 00 00 45 89 c8 0f 1f 00 4d 85 ed 75 0f 44 39 4e 20 49 c7 c2 4e 23 7a 81 4d 0f 45 ea <48> 8b 36 4d 01 c4 48 83 c3 01 48 39 fe 75 dd 48 8b 30 48 39 f0
    RIP  [<ffffffff81166504>] s_show+0xe4/0x330
     RSP <ffff8817fc9e1d98>
    CR2: 0000000000000000

    堆栈如下:

    crash> bt
    PID: 7739   TASK: ffff8817facd2100  CPU: 11  COMMAND: "slabtop"
    bt: invalid kernel virtual address: 776f645f7570635f  type: "cpu_online_map"
     #0 [ffff8817fc9e1960] machine_kexec at ffffffff8103244b
     #1 [ffff8817fc9e19c0] crash_kexec at ffffffff810baf92
     #2 [ffff8817fc9e1a90] oops_end at ffffffff814fded0
     #3 [ffff8817fc9e1ac0] no_context at ffffffff810425db
     #4 [ffff8817fc9e1b10] __bad_area_nosemaphore at ffffffff81042865
     #5 [ffff8817fc9e1b60] bad_area at ffffffff8104298e
     #6 [ffff8817fc9e1b90] __do_page_fault at ffffffff810430c0
     #7 [ffff8817fc9e1cb0] do_page_fault at ffffffff814ffefe
     #8 [ffff8817fc9e1ce0] page_fault at ffffffff814fd255
        [exception RIP: s_show+228]
        RIP: ffffffff81166504  RSP: ffff8817fc9e1d98  RFLAGS: 00010086
        RAX: ffff880c2fc217c0  RBX: 00000000000003fb  RCX: ffff880c2fc21800
        RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff880c2fc217d0
        RBP: ffff8817fc9e1e18   R8: 0000000000000001   R9: 0000000000000001
        R10: ffffffff817a234e  R11: 0000000000000246  R12: 00000000000003fb
        R13: ffffffff817a234e  R14: 0000000000000400  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
     #9 [ffff8817fc9e1e20] seq_read at ffffffff811a0a35
    #10 [ffff8817fc9e1ea0] proc_reg_read at ffffffff811e35be
    #11 [ffff8817fc9e1ef0] vfs_read at ffffffff8117ea75
    #12 [ffff8817fc9e1f30] sys_read at ffffffff8117ebb1
    #13 [ffff8817fc9e1f80] system_call_fastpath at ffffffff8100b0f2
        RIP: 000000370d0d83f0  RSP: 00007fff183a9450  RFLAGS: 00010202
        RAX: 0000000000000000  RBX: ffffffff8100b0f2  RCX: 0000000002160040
        RDX: 0000000000000400  RSI: 00007feb7bb8a000  RDI: 0000000000000003
        RBP: 000000000000079b   R8: 74616462616c7320   R9: 3020202020202061
        R10: 2030202020202020  R11: 0000000000000246  R12: 0000000000000000
        R13: 000000000000000a  R14: 000000000215a010  R15: 000000000000000a
        ORIG_RAX: 0000000000000000  CS: 0033  SS: 002b

    函数端在s_show:

    crash> dis -l s_show
    dis: s_show: duplicate text symbols found:
    ffffffff81023b70 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/arch/x86/kernel/cpu/mcheck/mce-severity.c: 162
    ffffffff810b2d30 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/kernel/kallsyms.c: 461
    ffffffff810f1800 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/kernel/trace/trace.c: 1984
    ffffffff8114e360 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/vmalloc.c: 2452
    ffffffff81166420 (t) s_show /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236

    发现很多个s_show的定义,所以反汇编下出错的地址:

    [exception RIP: s_show+228]
    RIP: ffffffff81166504

    crash> dis -l ffffffff81166504
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258
    0xffffffff81166504 <s_show+228>:        mov    (%rsi),%rsi

    根据代码行,找到的函数是slab.c中的s_show,可以很明显根据堆栈看到最后回溯的rsi是空指针,所以会出现访问空指针的oops。

    下面需要分析,rsi为啥是空指针。

    crash> dis -l ffffffff81166420
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
    0xffffffff81166420 <s_show>:    push   %rbp
    0xffffffff81166421 <s_show+1>:  mov    %rsp,%rbp
    0xffffffff81166424 <s_show+4>:  push   %r15
    0xffffffff81166426 <s_show+6>:  push   %r14
    0xffffffff81166428 <s_show+8>:  push   %r13
    0xffffffff8116642a <s_show+10>: push   %r12
    0xffffffff8116642c <s_show+12>: push   %rbx
    0xffffffff8116642d <s_show+13>: sub    $0x58,%rsp
    0xffffffff81166431 <s_show+17>: nopl   0x0(%rax,%rax,1)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166436 <s_show+22>: mov    %rsi,%rax
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
    0xffffffff81166439 <s_show+25>: mov    %rdi,-0x58(%rbp)
    0xffffffff8116643d <s_show+29>: mov    %rsi,-0x50(%rbp)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166441 <s_show+33>: sub    $0x8058,%rax--------------------找到对应的4237行
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239
    0xffffffff81166447 <s_show+39>: mov    $0x200,%esi
    0xffffffff8116644c <s_show+44>: mov    $0xffffffff81c05280,%rdi
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166453 <s_show+51>: mov    %rax,-0x38(%rbp)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239
    0xffffffff81166457 <s_show+55>: callq  0xffffffff81275a10 <find_first_bit>
    0xffffffff8116645c <s_show+60>: cmp    $0x200,%eax
    0xffffffff81166461 <s_show+65>: mov    %eax,%edx
    0xffffffff81166463 <s_show+67>: mov    $0x200,%eax
    0xffffffff81166468 <s_show+72>: cmovg  %eax,%edx
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4250
    0xffffffff8116646b <s_show+75>: cmp    $0x1ff,%edx
    0xffffffff81166471 <s_show+81>: jg     0xffffffff81166730 <s_show+784>
    0xffffffff81166477 <s_show+87>: xor    %r13d,%r13d
    0xffffffff8116647a <s_show+90>: movq   $0x0,-0x48(%rbp)
    0xffffffff81166482 <s_show+98>: movq   $0x0,-0x40(%rbp)
    0xffffffff8116648a <s_show+106>:        xor    %r15d,%r15d
    0xffffffff8116648d <s_show+109>:        xor    %ebx,%ebx
    0xffffffff8116648f <s_show+111>:        xor    %r12d,%r12d
    0xffffffff81166492 <s_show+114>:        nopw   0x0(%rax,%rax,1)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4251
    0xffffffff81166498 <s_show+120>:        mov    -0x38(%rbp),%rcx
    0xffffffff8116649c <s_show+124>:        movslq %edx,%rax
    0xffffffff8116649f <s_show+127>:        mov    0x8068(%rcx,%rax,8),%rax
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4252
    0xffffffff811664a7 <s_show+135>:        test   %rax,%rax
    0xffffffff811664aa <s_show+138>:        je     0xffffffff811666f8 <s_show+728>
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4256
    0xffffffff811664b0 <s_show+144>:        lea    0x40(%rax),%rcx
    0xffffffff811664b4 <s_show+148>:        mov    %rax,-0x60(%rbp)
    0xffffffff811664b8 <s_show+152>:        mov    %edx,-0x70(%rbp)
    0xffffffff811664bb <s_show+155>:        mov    %rcx,%rdi
    0xffffffff811664be <s_show+158>:        mov    %rcx,-0x68(%rbp)
    0xffffffff811664c2 <s_show+162>:        callq  0xffffffff814fcc50 <_spin_lock_irq>
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258
    0xffffffff811664c7 <s_show+167>:        mov    -0x60(%rbp),%rax
    0xffffffff811664cb <s_show+171>:        mov    -0x70(%rbp),%edx
    0xffffffff811664ce <s_show+174>:        mov    -0x68(%rbp),%rcx
    0xffffffff811664d2 <s_show+178>:        mov    0x10(%rax),%rsi
    0xffffffff811664d6 <s_show+182>:        lea    0x10(%rax),%rdi
    0xffffffff811664da <s_show+186>:        cmp    %rdi,%rsi
    0xffffffff811664dd <s_show+189>:        je     0xffffffff81166513 <s_show+243>
    0xffffffff811664df <s_show+191>:        mov    -0x38(%rbp),%r8
    0xffffffff811664e3 <s_show+195>:        mov    0x8018(%r8),%r9d
    0xffffffff811664ea <s_show+202>:        mov    %r9d,%r8d
    0xffffffff811664ed <s_show+205>:        nopl   (%rax)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4259
    0xffffffff811664f0 <s_show+208>:        test   %r13,%r13
    0xffffffff811664f3 <s_show+211>:        jne    0xffffffff81166504 <s_show+228>
    0xffffffff811664f5 <s_show+213>:        cmp    %r9d,0x20(%rsi)
    0xffffffff811664f9 <s_show+217>:        mov    $0xffffffff817a234e,%r10
    0xffffffff81166500 <s_show+224>:        cmovne %r10,%r13
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4258
    0xffffffff81166504 <s_show+228>:        mov    (%rsi),%rsi

    根据代码行号4258行,可以确定 在访问 slabs_full链表时出错异常:

      4235 static int s_show(struct seq_file *m, void *p)
       4236 {
       4237         struct kmem_cache *cachep = list_entry(p, struct kmem_cache, next);
       4238         struct slab *slabp;
       4239         unsigned long active_objs;
       4240         unsigned long num_objs;
       4241         unsigned long active_slabs = 0;
       4242         unsigned long num_slabs, free_objects = 0, shared_avail = 0;
       4243         const char *name;
       4244         char *error = NULL;
       4245         int node;
       4246         struct kmem_list3 *l3;
       4247
       4248         active_objs = 0;
       4249         num_slabs = 0;
       4250         for_each_online_node(node) {
       4251                 l3 = cachep->nodelists[node];
       4252                 if (!l3)
       4253                         continue;
       4254
       4255                 check_irq_on();
       4256                 spin_lock_irq(&l3->list_lock);
       4257
       4258                 list_for_each_entry(slabp, &l3->slabs_full, list) {

    要想获取slabp,就得解析l3,要想解析l3,则需要解析cachep,要解析cachep,则需要解析传入的void*p,根据堆栈void*p是 seq_read中传入的。我们来看看这个*p到底是个什么参数:

    根据反汇编代码,p就是一个头指针,它嵌入在kmem_cache中,

    crash> struct -xo kmem_cache
    struct kmem_cache {
         [0x0] struct array_cache *array[4096];
      [0x8000] unsigned int batchcount;
      [0x8004] unsigned int limit;
      [0x8008] unsigned int shared;
      [0x800c] unsigned int buffer_size;
      [0x8010] u32 reciprocal_buffer_size;
      [0x8014] unsigned int flags;
      [0x8018] unsigned int num;
      [0x801c] unsigned int gfporder;
      [0x8020] gfp_t gfpflags;
      [0x8028] size_t colour;
      [0x8030] unsigned int colour_off;
      [0x8038] struct kmem_cache *slabp_cache;
      [0x8040] unsigned int slab_size;
      [0x8044] unsigned int dflags;
      [0x8048] void (*ctor)(void *);
      [0x8050] const char *name;
      [0x8058] struct list_head next;------------嵌入

     对函数反汇编:

    crash> dis -l 0xffffffff81166420
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
    0xffffffff81166420 <s_show>:    push   %rbp
    0xffffffff81166421 <s_show+1>:  mov    %rsp,%rbp
    0xffffffff81166424 <s_show+4>:  push   %r15
    0xffffffff81166426 <s_show+6>:  push   %r14
    0xffffffff81166428 <s_show+8>:  push   %r13
    0xffffffff8116642a <s_show+10>: push   %r12
    0xffffffff8116642c <s_show+12>: push   %rbx
    0xffffffff8116642d <s_show+13>: sub    $0x58,%rsp
    0xffffffff81166431 <s_show+17>: nopl   0x0(%rax,%rax,1)
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166436 <s_show+22>: mov    %rsi,%rax-------------------------------------rsi赋值给了rax,rsi中存放的是s_show函数的第二个参数*p
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4236
    0xffffffff81166439 <s_show+25>: mov    %rdi,-0x58(%rbp)
    0xffffffff8116643d <s_show+29>: mov    %rsi,-0x50(%rbp)------------------------------rsi刚好又压栈了,所以根据rbp可以取出s_show的第二个参数*p
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166441 <s_show+33>: sub    $0x8058,%rax
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/include/linux/nodemask.h: 239
    0xffffffff81166447 <s_show+39>: mov    $0x200,%esi
    0xffffffff8116644c <s_show+44>: mov    $0xffffffff81c05280,%rdi
    /usr/src/debug/kernel-2.6.32-220.el6/linux-2.6.32-220.el6.x86_64/mm/slab.c: 4237
    0xffffffff81166453 <s_show+51>: mov    %rax,-0x38(%rbp)

    查找rbp堆栈,然后-0x50,就可以获取到*p;

     #8 [ffff8817fc9e1ce0] page_fault at ffffffff814fd255
        [exception RIP: s_show+228]
        RIP: ffffffff81166504  RSP: ffff8817fc9e1d98  RFLAGS: 00010086
        RAX: ffff880c2fc217c0  RBX: 00000000000003fb  RCX: ffff880c2fc21800
        RDX: 0000000000000000  RSI: 0000000000000000  RDI: ffff880c2fc217d0
        RBP: ffff8817fc9e1e18   R8: 0000000000000001   R9: 0000000000000001
        R10: ffffffff817a234e  R11: 0000000000000246  R12: 00000000000003fb
        R13: ffffffff817a234e  R14: 0000000000000400  R15: 0000000000000000
        ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        ffff8817fc9e1ce8: 0000000000000000 0000000000000400
        ffff8817fc9e1cf8: ffffffff817a234e 00000000000003fb
        ffff8817fc9e1d08: ffff8817fc9e1e18 00000000000003fb
        ffff8817fc9e1d18: 0000000000000246 ffffffff817a234e
        ffff8817fc9e1d28: 0000000000000001 0000000000000001
        ffff8817fc9e1d38: ffff880c2fc217c0 ffff880c2fc21800
        ffff8817fc9e1d48: 0000000000000000 0000000000000000
        ffff8817fc9e1d58: ffff880c2fc217d0 ffffffffffffffff
        ffff8817fc9e1d68: ffffffff81166504 0000000000000010
        ffff8817fc9e1d78: 0000000000010086 ffff8817fc9e1d98
        ffff8817fc9e1d88: 0000000000000018 ffffffff811664c7
        ffff8817fc9e1d98: ffff8812666d8080 ffff881158954000
        ffff8817fc9e1da8: ffff881700000000 ffff880c2fc21800
        ffff8817fc9e1db8: ffff880c2fc217c0 ffff8817f3eee740
        ffff8817fc9e1dc8: ffff880c2fd18498 0000000000000000
        ffff8817fc9e1dd8: 0000000000000000 ffff880c2fd10440
        ffff8817fc9e1de8: ffff8817fc9e1e18 ffff8817f3eee740
        ffff8817fc9e1df8: ffff880c9ddbba80 ffff880c2fd18498
        ffff8817fc9e1e08: 0000000000000400 ffff8817fc9e1e60
        ffff8817fc9e1e18: ffff8817fc9e1e98 ffffffff811a0a35
     #9 [ffff8817fc9e1e20] seq_read at ffffffff811a0a35
    
    crash> struct -xo kmem_cache
    struct kmem_cache {
         [0x0] struct array_cache *array[4096];
      [0x8000] unsigned int batchcount;
      [0x8004] unsigned int limit;
      [0x8008] unsigned int shared;
      [0x800c] unsigned int buffer_size;
      [0x8010] u32 reciprocal_buffer_size;
      [0x8014] unsigned int flags;
      [0x8018] unsigned int num;
      [0x801c] unsigned int gfporder;
      [0x8020] gfp_t gfpflags;
      [0x8028] size_t colour;
      [0x8030] unsigned int colour_off;
      [0x8038] struct kmem_cache *slabp_cache;
      [0x8040] unsigned int slab_size;
      [0x8044] unsigned int dflags;
      [0x8048] void (*ctor)(void *);
      [0x8050] const char *name;
      [0x8058] struct list_head next;
      [0x8068] struct kmem_list3 *nodelists[512];
    }
    SIZE: 0x9068
    crash> px 0xffff880c2fd18498-0x8058
    $4 = 0xffff880c2fd10440
    crash> struct kmem_cache 0xffff880c2fd10440
    struct kmem_cache {
      array = {0xffff880c2fe93180, 0xffff880c1199cec0, 0xffff880c1199c6c0, 0xffff880c11a4cd80, 0xffff881811e1f580, 0xffff881811e1fd80, 0xffff881811e796c0, 0xffff881811e79ec0, 0xffff880c11a4c580, 0xffff880c11ae6cc0, 0xffff880c11ae64c0, 0xffff880c11b2bb80, 0xffff881811eee780, 0xffff881811f3a0c0, 0xffff881811f3a8c0, 0xffff881811f7d180, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
      batchcount = 12,
      limit = 24,
      shared = 8,
      buffer_size = 4096,
      reciprocal_buffer_size = 1048576,
      flags = 2147753984,
      num = 1,
      gfporder = 0,
      gfpflags = 0,
      colour = 0,
      colour_off = 64,
      slabp_cache = 0xffff880c2fc40100,---------------slab的管理数据和slab的obj分离,
      slab_size = 52,
      dflags = 0,
      ctor = 0x0,
      name = 0xffffffff817a24d8 "size-4096",
      next = {
        next = 0xffff880c2fd08458,
        prev = 0xffff880c2fd284d8
      },
      nodelists = {0xffff880c2fc217c0, 0xffff88182fc007c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}
    }
    根据找到的p值,我们确定了在遍历地址为0xffff880c2fd10440 的kmem_cache结构的nodelists的slabs_full 链表时,访问了空指针。

    不过,因为nodelists是一个数组,我们现需要确定访问哪个下标时出错了。还好因为系统只有两个node,所以干脆从下标0遍历一下看:

    crash> struct -xo kmem_list3
    struct kmem_list3 {
       [0x0] struct list_head slabs_partial;
      [0x10] struct list_head slabs_full;
      [0x20] struct list_head slabs_free;
      [0x30] unsigned long free_objects;
      [0x38] unsigned int free_limit;
      [0x3c] unsigned int colour_next;
      [0x40] spinlock_t list_lock;
      [0x48] struct array_cache *shared;
      [0x50] struct array_cache **alien;
      [0x58] unsigned long next_reap;
      [0x60] int free_touched;
    }
    SIZE: 0x68
    crash> rd 0xffff880c2fc217c0 40
    ffff880c2fc217c0:  ffff880c2fc217c0 ffff880c2fc217c0   .../......./....
    ffff880c2fc217d0:  ffff88049dd9ea80 ffff880c2fc23380   .........3./....
    ffff880c2fc217e0:  ffff880c2fc217e0 ffff880c2fc217e0   .../......./....
    ffff880c2fc217f0:  0000000000000000 0000000000000061   ........a.......
    ffff880c2fc21800:  0000000006d606d5 ffff880c2fe78800   .........../....
    ffff880c2fc21810:  ffff880c2fc203e0 00000001030105e7   .../............
    ffff880c2fc21820:  0000000000000000 0000000000000000   ................
    ffff880c2fc21830:  0000000000000000 0000000000000000   ................
    ffff880c2fc21840:  ffff880c2fc21840 ffff880c2fc21840   @../....@../....
    ffff880c2fc21850:  ffff880c2fc21850 ffff880c2fc21850   P../....P../....
    ffff880c2fc21860:  ffff880c2fc21860 ffff880c2fc21860   `../....`../....
    ffff880c2fc21870:  0000000000000000 0000000000000061   ........a.......
    ffff880c2fc21880:  00000000058f058f ffff880c2fe78400   .........../....
    ffff880c2fc21890:  ffff880c2fc20400 00000001030105e7   .../............
    ffff880c2fc218a0:  0000000000000000 0000000000000000   ................
    ffff880c2fc218b0:  0000000000000000 0000000000000000   ................
    ffff880c2fc218c0:  ffff880c2fc218c0 ffff880c2fc218c0   .../......./....
    ffff880c2fc218d0:  ffff88079399db40 ffff880c2fc231c0   @........1./....
    ffff880c2fc218e0:  ffff8807989ff200 ffff88049dc62780   .........'......
    ffff880c2fc218f0:  0000000000000012 0000000000000021   ........!.......
    crash> slab ffff880c2fc217e0
    struct slab {
      list = {
        next = 0xffff880c2fc217e0,
        prev = 0xffff880c2fc217e0
      },
      colouroff = 0,
      s_mem = 0x61,
      inuse = 114689749,
      free = 0,
      nodeid = 34816
    }
    crash> slab ffff880c2fc217c0
    struct slab {
      list = {
        next = 0xffff880c2fc217c0,
        prev = 0xffff880c2fc217c0
      },
      colouroff = 18446612152142391936,
      s_mem = 0xffff880c2fc23380,
      inuse = 801249248,
      free = 4294936588,
      nodeid = 6112
    }

    从上面的输出可以看出,nodelists [0]中对应的slabs_free链表为空,slabs_partial 链表为空, 只有 slabs_full 有数据。

    slabs_full的地址就是nodelists[i] 的地址偏移0x10,遍历一下:

    crash> list -s slab.inuse 0xffff88182fc007d0 >caq.slab_1
    crash> list -s slab.inuse 0xffff880c2fc217d0 >caq.slab_0

     由于list遇到null会认为结束,所以一开始list没出错,我还以为自己分析的地址有问题,打开我的输出文件才发现,确实slab的list出问题了。要知道,list访问到null或者访问到循环自己,

    都会结束。

    2035 ffff88049dd9af40
    2036 inuse = 1
    2037 ffff880799a9e2c0
    2038 inuse = 0------------------full链表不可能inuse为0
    2039 ffff880bee40a000
    2040 inuse = 32768

     查看一下内容:

    crash> slab ffff880799a9e2c0
    struct slab {
      list = {
        next = 0xffff880bee40a000,
        prev = 0x10000000000
      },
      colouroff = 18446612152141553664,
      s_mem = 0xb00000100,
      inuse = 0,
      free = 0,
      nodeid = 0
    }
    crash> slab 0xffff880bee40a000
    struct slab {
      list = {
        next = 0x0,-----------------------null指针出现了。
        prev = 0x2185b85600020
      },
      colouroff = 16384,
      s_mem = 0x2187025e00020,
      inuse = 32768,
      free = 0,
      nodeid = 32
    }

    null指针出现了,该slab管理单元的prev已经不可信,所以要找到上一个slab,看ffff880799a9e2c0 ,发现它的数据有问题,才是导致这个oops的根本原因,因为ffff880799a9e2c0 中的内容

    不是一个正常的slab,按照next访问的时候,才出现的异常,我们来看一下ffff880799a9e2c0  前后的内容。

    ffff880799a9e200:  0000002800000000 006e280a00000000   ....(........(n.
    ffff880799a9e210:  000005240000230d 20f6cb8000000555   .#..$...U......
    ffff880799a9e220:  002c000000002e1c 006e127600000000   ......,.....v.n.
    ffff880799a9e230:  0000000000000000 0000000000000000   ................
    ffff880799a9e240:  0000000000000000 ffff88040000051d   ................
    ffff880799a9e250:  0000000300000004 ffff880b356be9c0   ..........k5....
    ffff880799a9e260:  6664732f7665642f 0000000000000000   /dev/sdf........
    ffff880799a9e270:  0000000000000000 0000000000000000   ................
    ffff880799a9e280:  0000000000000000 0000000000000000   ................
    ffff880799a9e290:  0000000000000000 0000000000000000   ................
    ffff880799a9e2a0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2b0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2c0:  ffff880bee40a000 0000010000000000   ..@.............-------------------------------ffff880799a9e2c0地址的内容如下
    ffff880799a9e2d0:  ffff88049dcd2000 0000000b00000100   . ..............
    ffff880799a9e2e0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2f0:  0000000000000000 0000000000000000   ................
    ffff880799a9e300:  00000000000f0015 30305f3661adaa00   ...........a6_00
    ffff880799a9e310:  5f613030305f3031 3130303030303030   10_000a_00000001
    ffff880799a9e320:  0020000000002900 0000000000000000   .).... .........
    ffff880799a9e330:  0000000000000000 0000000000000000   ................
    ffff880799a9e340:  0000000000000000 0000280a00000000   .............(..
    ffff880799a9e350:  0000000000000000 0000000000000000   ................
    ffff880799a9e360:  0000000000000000 ffffffff81099c20   ........ .......
    ffff880799a9e370:  0000000000000000 0000000000000000   ................

     看一下ffff880799a9e2c0 本身属于什么数据:

    kmem ffff880799a9e2c0
    CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
    ffff880c2fc40100 size-64                   64    1099970   1116870  18930     4k
    SLAB              MEMORY            TOTAL  ALLOCATED  FREE
    ffff880799a9e000  ffff880799a9e140     59         48    11
    FREE / [ALLOCATED]
      [ffff880799a9e2c0]
    
          PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
    ffffea001a99d290  799a9e000                0        0  1 40000000000080 slab

    由于kmem_cache的slab管理数据和slab的obj可以分离,所以根据 struct kmem_cache 0xffff880c2fd10440 对应的 slabp_cache  成员的值为 0xffff880c2fc40100 ,它也是一个kmem_cache

    crash> struct -x kmem_cache 0xffff880c2fc40100
    struct kmem_cache {
      array = {0xffff880c2fe9c000, 0xffff880c1199ec00, 0xffff880c11a3f000, 0xffff880c11a4f400, 0xffff881811e42000, 0xffff881811e85000, 0xffff881811ec1000, 0xffff881811ef4000, 0xffff880c11a99800, 0xffff880c11ae9c00, 0xffff880c11af9000, 0xffff880c11b2f400, 0xffff881811f26000, 0xffff881811f3f000, 0xffff881811f5d000, 0xffff881811f83000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...},
      batchcount = 0x3c,
      limit = 0x78,
      shared = 0x8,
      buffer_size = 0x40,
      reciprocal_buffer_size = 0x4000000,
      flags = 0x42000,
      num = 0x3b,
      gfporder = 0x0,
      gfpflags = 0x0,
      colour = 0x0,
      colour_off = 0x40,
      slabp_cache = 0x0,
      slab_size = 0x140,
      dflags = 0x0,
      ctor = 0x0,
      name = 0xffffffff817a2435 "size-64",
      next = {
        next = 0xffff880c2fc38118,
        prev = 0xffff880c2fc58198
      },
      nodelists = {0xffff880c2fc21140, 0xffff88182fc00140, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0...}
    }
    crash> struct kmem_list3 0xffff880c2fc217c0
    struct kmem_list3 {
      slabs_partial = {
        next = 0xffff880c2fc217c0,
        prev = 0xffff880c2fc217c0
      },
      slabs_full = {
        next = 0xffff88049dd9ea80,
        prev = 0xffff880c2fc23380
      },
      slabs_free = {
        next = 0xffff880c2fc217e0,
        prev = 0xffff880c2fc217e0
      },
      free_objects = 0,----------------free个数为0
      free_limit = 97,
      colour_next = 0,
      list_lock = {
        raw_lock = {
          slock = 114689749
        }
      },
      shared = 0xffff880c2fe78800,
      alien = 0xffff880c2fc203e0,
      next_reap = 4345365991,
      free_touched = 0
    }

     它是一个size-64的kmem_cache,也就是size 4096的cache的slab的管理数据,其实就是size-64的cache的obj。现在的问题是,这个obj被异常踩了,踩的地址是:ffff880799a9e2c0

    我尝试找一下,跟这个内容一样的内存有没有:
    crash> search ffff880bee40a000
    ffff88049dd70d18: ffff880bee40a000
    ffff880799a9e2c0: ffff880bee40a000
    ffff880be23cd9c0: ffff880bee40a000

    分别rd一下这三个地址,发现 ffff880799a9e2c0 和 ffff880be23cd9c0 中的内容是相同的:

    crash> rd ffff880be23cd9c0   8
    ffff880be23cd9c0:  ffff880bee40a000 0000010000000000   ..@.............---------------------可能的源
    ffff880be23cd9d0:  ffff88049dcd2000 0000000b00000100   . ..............
    ffff880be23cd9e0:  0000000000000000 0000000000000000   ................
    ffff880be23cd9f0:  0000000000000000 0000000000000000   ................
    crash> rd ffff880799a9e2c0 8
    ffff880799a9e2c0:  ffff880bee40a000 0000010000000000   ..@.............----------------------被踩的,
    ffff880799a9e2d0:  ffff88049dcd2000 0000000b00000100   . ..............
    ffff880799a9e2e0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2f0:  0000000000000000 0000000000000000   ................

    如上所示:两个地址里面的内容一模一样,

    crash> kmem ffff880be23cd9c0
    CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
    ffff880c2fc00040 size-32                   32      38254     39200    350     4k
    SLAB              MEMORY            TOTAL  ALLOCATED  FREE
    ffff880be23cd000  ffff880be23cd200    112         66    46
    FREE / [ALLOCATED]
      [ffff880be23cd9c0]
    
          PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
    ffffea002997d4d8  be23cd000                0       64  1 40000000000080 slab

     看一下这个slab的管理obj的数据的使用情况:

    crash> slab ffff880be23cd000
    struct slab {
      list = {
        next = 0xffff880c0b164000,
        prev = 0xffff880be6f95000
      },
      colouroff = 512,
      s_mem = 0xffff880be23cd200,
      inuse = 105,
      free = 32,---------指向第一个free的节点,然后32节点中的数字指向下一个free的节点,
      nodeid = 0
    }
    crash> rd -32 0xffff880c0b164030  112
    ffff880c0b164030:  00000015 00000002 00000003 00000026   ............&...
    ffff880c0b164040:  00000023 ffffffff 0000002e 00000006   #...............
    ffff880c0b164050:  0000002e ffffffff ffffffff 0000001f   ................
    ffff880c0b164060:  0000000d 0000002a 0000003f 00000016   ....*...?.......
    ffff880c0b164070:  00000011 0000000f 00000013 0000002a   ............*...
    ffff880c0b164080:  00000036 00000021 0000002c 0000003c   6...!...,...<...
    ffff880c0b164090:  0000006b 00000018 00000019 00000042   k...........B...
    ffff880c0b1640a0:  0000001b 0000001c 0000001d ffffffff   ................
    ffff880c0b1640b0:  0000002a 00000001 00000003 00000013   *...............
    ffff880c0b1640c0:  00000023 00000024 00000027 0000002b   #...$...'...+...
    ffff880c0b1640d0:  00000029 00000020 0000003f 0000000b   )... ...?.......
    ffff880c0b1640e0:  00000014 0000002c 00000014 0000000e   ....,...........
    ffff880c0b1640f0:  0000002f ffffffff 00000013 0000002f   /.........../...
    ffff880c0b164100:  0000003b 00000008 0000000b 0000003e   ;...........>...
    ffff880c0b164110:  00000033 00000038 0000003b 00000030   3...8...;...0...
    ffff880c0b164120:  0000003d 0000003e ffffffff 00000026   =...>.......&...
    ffff880c0b164130:  00000007 00000040 00000041 00000042   ....@...A...B...
    ffff880c0b164140:  00000043 00000044 00000045 00000046   C...D...E...F...
    ffff880c0b164150:  00000047 00000048 00000049 0000004a   G...H...I...J...
    ffff880c0b164160:  0000004b 0000004c 0000004d 0000004e   K...L...M...N...
    ffff880c0b164170:  0000004f 00000050 00000051 00000052   O...P...Q...R...
    ffff880c0b164180:  00000053 00000054 00000055 00000056   S...T...U...V...
    ffff880c0b164190:  00000057 00000058 00000059 0000005a   W...X...Y...Z...
    ffff880c0b1641a0:  0000005b 0000005c 0000005d 0000005e   [......]...^...
    ffff880c0b1641b0:  0000005f 00000060 00000061 00000062   _...`...a...b...
    ffff880c0b1641c0:  00000063 00000064 00000065 00000066   c...d...e...f...
    ffff880c0b1641d0:  00000067 00000068 00000069 0000006a   g...h...i...j...
    ffff880c0b1641e0:  0000006b 0000006c 0000006d ffffffff   k...l...m.......

    根据上面的数据,也就是这个slab中目前空闲的为32-2a-3f-26-27-2b-b-1f(这个是最后一个,不能算free),也就是7个free,根据slab重的105个in_use的统计,总共是112个,数据是ok的。

    我们目前知道这个内存被踩了,但是这个地址不一定是被踩的初始地址,所以,有必要往上找,看哪个地址是被踩的初始地址(当然不排除踩多次)。

    先尝试根据双向循环链表,恢复一下原来的链表。

    ffff88049dd9af40的next指向了ffff880799a9e2c0,但由于 ffff880799a9e2c0 的地址里面数据是错的,所以 ffff880799a9e2c0 的数据不能用,但是 ffff880799a9e2c0 应该也是它的下一个

    元素的prev指针,所以search一下,看谁的内存中有 ffff880799a9e2c0 这个值。

    crash> search ffff880799a9e2c0
    ffff8800390873c0: ffff880799a9e2c0
    ffff88049dd9af40: ffff880799a9e2c0
    ffff88079685a948: ffff880799a9e2c0
    ffffea00102873c0: ffff880799a9e2c0
    crash> kmem ffff8800390873c0
          PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
    ffffea0000c79d88   39087000                0        0  1 20000000000400 reserved
    crash> kmem ffff88079685a948
    CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
    ffff880c2fc40100 size-64                   64    1099970   1116870  18930     4k
    SLAB              MEMORY            TOTAL  ALLOCATED  FREE
    ffff88079685a000  ffff88079685a140     59         24    35
    FREE / [ALLOCATED]
      [ffff88079685a940]
    
          PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
    ffffea001a8ed3b0  79685a000                0        0  1 40000000000080 slab
    crash> kmem ffffea00102873c0
          PAGE         PHYSICAL      MAPPING       INDEX CNT FLAGS
    ffffea0000c79d88   39087000                0        0  1 20000000000400 reserved
    crash> slab ffff88079685a940
    struct slab {
      list = {
        next = 0xffff88079399a3c0,
        prev = 0xffff880799a9e2c0
      },
      colouroff = 0,
      s_mem = 0xffff8806b875d000,
      inuse = 1,
      free = 4294967295,
      nodeid = 0
    }

    查到了是 ffff88079685a940 这个prev是  0xffff880799a9e2c0,继续遍历940。

    list  -s slab.inuse ffff88079685a940
    。。。。
    
    ffff880795767240
      inuse = 1
    ffff880799a9e280
      inuse = 0

    最终遍历到240,又遇到一个异常的280,需要跟 0xffff880799a9e2c0类似,search找谁的内存中有280这个地址,最终恢复了原链表为:

    f40->2c0->940->.....->240->280->ac0->....->f40,形成循环链表。

    可以看出,280和2c0在地址上是连续的里面的内容全被踩了,那么踩内存的可能就是从280开始拷贝。

    crash> rd ffff880be23cd900 128
    ffff880be23cd900:  0000000000000000 0000000000000000   ................
    ffff880be23cd910:  0000000000000000 0000000000000000   ................
    ffff880be23cd920:  0000000000666473 0000000000000000   sdf.............
    ffff880be23cd930:  0000000000000000 0000000000000000   ................
    ffff880be23cd940:  0000000000000000 ffff88040000051d   ................
    ffff880be23cd950:  0000000300000004 ffff880b356be9c0   ..........k5....
    ffff880be23cd960:  6664732f7665642f 0000000000000000   /dev/sdf........
    ffff880be23cd970:  0000000000000000 0000000000000000   ................
    ffff880be23cd980:  0000000000000000 0000000000000000   ................--------可能的源
    ffff880be23cd990:  0000000000000000 0000000000000000   ................
    ffff880be23cd9a0:  0000000000000000 0000000000000000   ................
    ffff880be23cd9b0:  0000000000000000 0000000000000000   ................
    ffff880be23cd9c0:  ffff880bee40a000 0000010000000000   ..@.............
    ffff880be23cd9d0:  ffff88049dcd2000 0000000b00000100   . ..............
    ffff880be23cd9e0:  0000000000000000 0000000000000000   ................
    ffff880be23cd9f0:  0000000000000000 0000000000000000   ................
    
    crash> rd ffff880799a9e200 128
    ffff880799a9e200:  0000002800000000 006e280a00000000   ....(........(n.
    ffff880799a9e210:  000005240000230d 20f6cb8000000555   .#..$...U......
    ffff880799a9e220:  002c000000002e1c 006e127600000000   ......,.....v.n.
    ffff880799a9e230:  0000000000000000 0000000000000000   ................
    ffff880799a9e240:  0000000000000000 ffff88040000051d   ................------------------已经被分配出去
    ffff880799a9e250:  0000000300000004 ffff880b356be9c0   ..........k5....
    ffff880799a9e260:  6664732f7665642f 0000000000000000   /dev/sdf........
    ffff880799a9e270:  0000000000000000 0000000000000000   ................
    ffff880799a9e280:  0000000000000000 0000000000000000   ................-------------目的地址
    ffff880799a9e290:  0000000000000000 0000000000000000   ................
    ffff880799a9e2a0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2b0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2c0:  ffff880bee40a000 0000010000000000   ..@.............
    ffff880799a9e2d0:  ffff88049dcd2000 0000000b00000100   . ..............
    ffff880799a9e2e0:  0000000000000000 0000000000000000   ................
    ffff880799a9e2f0:  0000000000000000 0000000000000000   ................
    ffff880799a9e300:  00000000000f0015 30305f3661adaa00   ...........a6_00
    ffff880799a9e310:  5f613030305f3031 3130303030303030   10_000a_00000001

     一种可能是我上面分析的,源地址这边memcpy,然后目的地址是我们的9e280,还有一种可能是,两者反过来,因为并不知道到底谁是源,甚至踩多次的情况,也就是拷贝多次。

    感觉分析不下去了,这种踩内存不知道怎么分析。数据没有明显的特征,这个是同事遇到的,两次都没有帮到同事,心里很不是滋味,功力不够,继续修炼。

     
    水平有限,如果有错误,请帮忙提醒我。如果您觉得本文对您有帮助,可以点击下面的 推荐 支持一下我。版权所有,需要转发请带上本文源地址,博客一直在更新,欢迎 关注 。
  • 相关阅读:
    【leetcode】 61. 旋转链表
    【leetcode】 55 跳跃游戏
    【leetcode 53】 最大子序和
    【leetcode】不同路径
    【leetcode】692. 前K个高频单词
    vue a标签使用@click
    函数式接口的使用
    【转】MyBatis中的collection两种使用方法
    xaf--homepage
    Windows10--设置鼠标自带光圈效果
  • 原文地址:https://www.cnblogs.com/10087622blog/p/9491497.html
Copyright © 2020-2023  润新知