• Redhat Crash Utility-Ramdump


    Redhat Crash Utility

    edit by liaoye@2014/9/16  

    http://blog.csdn.net/paul_liao



    Crash utility是redhat提供的开源的ramdump解析工具,官方站点:http://people.redhat.com/anderson/ ,能够下载源代码编译。展讯、Marvell和MTK 平台的ramdump能够用Crash utility解析,高通有自家的工具或者用trace32。


    Crash utility 编译
    1、 须要安装的工具
    sudo apt-get install libaio-dev  libncurses5-dev  zlib1g-dev liblzma-dev  flex bison byacc


    2、解压缩包编译
    tar zxvf crash-7.0.8.tar.gz
    cd crash-7.0.8

    make target=ARM

    假设须要64bit:

    make target=ARM64

    3、编译外部lib

    make extensions target=ARM64

    展讯 ramdump抓取方法
    当系统出现kernel panic的时候会自己主动把ramdump保持在T卡log的 sysdump文件下,一共两个文件:
     
    使用crash utility解析时须要合成一个dump文件才干解析:
    cat sysdump.core.0* > dump.bin

    Marvell ramdump抓取方法
    当系统出现kernel panic的时候会自己主动进入EMMD dump模式,假设检測到SD card。 屏幕显示“EMMD SD DUMP”,系统会自己主动把整个memory 保存到sdcard中。然后关机,能够从sdcard中拿到RAMDUMP0000.gz。否则显示“EMMD USB DUMP”。通过USB连接电脑用fastboot 工具将memory dump出来。
    Linux
    # fastboot-linux-marvell dump dump.bin
    Windows:
    D:fastboot_windows>fastboot-windows-marvell.exe dump dump.bin

    MTK ramdump抓取方法

    a.使能ramdump机制

    须要加入例如以下代码

    diff --gita/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.cb/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

    index 8b2b93a..2ec509f 100644

    ---a/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

    +++b/alps/kernel-3.10/drivers/misc/mediatek/aee/mrdump/mrdump_full.c

    @@ -457,6 +457,17 @@ static int __initmrdump_init(void)

           }

           

           atomic_notifier_chain_register(&panic_notifier_list,&mrdump_panic_blk);

    +           //add this block

    +               

    +       {

    +      mrdump_enable = 1;

    +                                         

    +      mrdump_plat->hw_enable(mrdump_enable);

    +                                                          

    +      mrdump_cb->machdesc.nr_cpus = NR_CPUS;

    +      

    +      __inner_flush_dcache_all();

    +       }

           return 0;

     }

    打开config

    +CONFIG_MTK_AEE_POWERKEY_HANG_DETECT=y

    +CONFIG_MTK_AEE_MRDUMP=y

    +CONFIG_MTK_MRDUMP=y

    +CONFIG_MTK_DBG_DUMP=y

    另外须要关闭:CONFIG_MTK_AEE_IPANIC,打开了会生成sys_mini_dump。从而不会生成sys_core_dump。

    Cat /sys/module/mrdump/parameters/enable 确认是否生效

    b.抓取ramdump

    Kernel出现panic or oops之后会重新启动进入lkramdump mode,把ram转储到/data/No_Delete.rdmp。然后在收集到mtklog/aee_exp/db*文件里。通过gat工具导出并把SYS_COREDUMP解析出来就可以。


    高通ramdump抓取方法

    Kernel出现panic or oops之后会重新启动进入ramdump mode, 然后通过QPST工具把ramdump导出来。高通提供了解析工具linux ramdump parser和crashscope能够进行简单的解析,更复杂的解析须要trace32。


    crash utility使用
    官方提供了具体的使用文档http://people.redhat.com/anderson/crash_whitepaper,可供參考,以下是一些经常使用的操作。

    1、 进入crash命令行:./crash-arm  vmlinux  dump.bin
    paul@paul-VirtualBox:~$ ./crash-arm  vmlinux  dump.bin 


    crash-arm 7.0.5
    Copyright (C) 2002-2014  Red Hat, Inc.
    Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
    Copyright (C) 1999-2006  Hewlett-Packard Co
    Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
    Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
    Copyright (C) 2005, 2011  NEC Corporation

    Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
    Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
    This program is free software, covered by the GNU General Public License,
    and you are welcome to change it and/or distribute copies of it under
    certain conditions.  Enter "help copying" to see the conditions.
    This program has absolutely no warranty.  Enter "help warranty" for details.

    GNU gdb (GDB) 7.6
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-elf-linux"...

          KERNEL: vmlinux                           
        DUMPFILE: dump.bin
            CPUS: 1
            DATE: Wed Jan  1 10:26:26 2014
          UPTIME: 00:34:14
    LOAD AVERAGE: 3.61, 3.59, 3.16
           TASKS: 650
        NODENAME: localhost
         RELEASE: 3.10.33
         VERSION: #4 SMP PREEMPT Wed Sep 10 14:44:32 CST 2014
         MACHINE: armv7l  (unknown Mhz)
          MEMORY: 512 MB
           PANIC: "c0 4233 (sh) Internal error: Oops: 805 [#1] PREEMPT SMP ARM" (check log for details)

             PID: 4233
         COMMAND: "sh"
            TASK: d37f7b40  [THREAD_INFO: cf512000]
             CPU: 0
           STATE: TASK_RUNNING (PANIC)

    crash-arm>
    Crash-arm是编译出来的crash工具二进制文件。 dump.bin是抓取到的ramdump。vmlinux和dump.bin的版本号必需要要匹配上。否则无法解析。


    2、然后在命令行运行log指令获取到kmsg
    crash-arm> log  
    or
    crash-arm> log > kmsg

    3、bt 获取调用栈,通过调用栈的信息能够恢复现场查找问题。
    crash-arm> bt
    PID: 37     TASK: db34a640  CPU: 0   COMMAND: "kworker/u8:1"
     #0 [<c016ad38>] (try_to_suspend) from [<c0143a5c>]
     #1 [<c0143a5c>] (process_one_work) from [<c0144138>]
     #2 [<c0144138>] (worker_thread) from [<c0149c94>]
     #3 [<c0149c94>] (kthread) from [<c010f498>]

    crash-arm> bt -f
    PID: 37     TASK: db34a640  CPU: 0   COMMAND: "kworker/u8:1"
     #0 [<c016ad38>] (try_to_suspend) from [<c0143a5c>]
        [PC: c016ad38  LR: c0143a5c  SP: db391ee8  SIZE: 16]
        db391ee8: 00000838 c0a5f01c db367080 c0143a5c 
     #1 [<c0143a5c>] (process_one_work) from [<c0144138>]
        [PC: c0143a5c  LR: c0144138  SP: db391ef8  SIZE: 56]

        db391ef8: c2907600 c0a7be74 00000001 00000000 
        db391f08: 00000000 db367080 db80ec14 db367098 
        db391f18: db390000 db390000 c0ab39a3 00000001 
        db391f28: db80ec00 c0144138 

    #2 [<c0144138>] (worker_thread) from [<c0149c94>]
        [PC: c0144138  LR: c0149c94  SP: db391f30  SIZE: 56]
        db391f30: c0144000 00000000 00000000 db390000 
        db391f40: db391f64 db8b3e98 00000000 db367080 
        db391f50: c0144000 00000000 00000000 00000000 
        db391f60: 00000000 c0149c94 

    #3 [<c0149c94>] (kthread) from [<c010f498>]
        [PC: c0149c94  LR: c010f498  SP: db391f68  SIZE: 72]
        db391f68: 04000000 00000000 00000000 db367080 
        db391f78: 00000000 00000000 db391f80 db391f80 
        db391f88: 00000000 00000000 db391f90 db391f90 
        db391f98: db391fac db8b3e98 c0149bf0 00000000 
        db391fa8: 00000000 c010f498

    PC program counter。指向当前指向的指令;
    LR link register。指向下一条指向的指令;
    SP stack pointer。Linux栈的生长方向是由高地址向低地址。


    分析下上面红颜色标记的栈数据的含义,首先反汇编vmlinux得到:
    static void process_one_work(struct worker *worker, struct work_struct *work)

     162360 __releases(&pool->lock)
     162361 __acquires(&pool->lock)
     162362 {
     162363 c0143928:   e92d4ff0    push    {r4, r5, r6, r7, r8, r9, sl, fp, lr}
     162364 c014392c:   e1a05001    mov r5, r1
     162365 c0143930:   e5913000    ldr r3, [r1]
     162366 c0143934:   e24dd014    sub sp, sp, #20
     162367 c0143938:   e1a04000    mov r4, r0

    ……


    能够看出从后面開始依次是lr, fp, sl, r9, r8, r7, r6, r5, r4。其它的是后来入栈的数据,能够对比汇编查找。
       c2907600 c0a7be74 00000001 00000000 
        00000000 db367080 db80ec14 db367098 
        db390000 db390000 c0ab39a3 00000001 
        db80ec00 c0144138 

    4、struct指令, 通过上面的调用栈信息能够恢复相关的数据,比方struct work_struct。


    crash-arm> struct work_struct c0a5f02c
    struct work_struct {
      data = {
        counter = 0
      }, 
      entry = {
        next = 0x0, 
        prev = 0xc0a5f034 <autosleep_lock+8>
      }, 
      func = 0xc0a5f034 <autosleep_lock+8>
    }

    5、whatis 获取函数原型
    crash-arm> whatis try_to_suspend 
    void try_to_suspend(struct work_struct *);


    6、解析出logcat
    载入外部logcat.so
    crash-arm> extend logcat.so
    crash-arm> logcat

    7、help, 很多其它指令能够输入help查询或http://people.redhat.com/anderson/crash_whitepaper 


    Case study
    1、制造kernel panic能够加入空指针,也能够echo c > /proc/sysrq-trigger。我在代码里做了

    例如以下改动:
    +++kernel/power/autosleep.c
    @@ -26,12 +30,16 @@
     static void try_to_suspend(struct work_struct *work)
     {
      unsigned int initial_count, final_count;
    + int *p = 0;
     

      if (!pm_get_wakeup_count(&initial_count, true))
      goto out;
     
      mutex_lock(&autosleep_lock);
     
    + if (work->func != NULL) 
    + *p = 6;
    +

      if (!pm_save_wakeup_count(initial_count) ||
    当work->func不为NULL(这里仅仅是为了做实验,work->func肯定不会为NULL)时。给指向地址0的指针P赋值导致出现panic。




    2、 运行log指令,从解析的kmsg信息中能够定位到出现panic的详细位置
    PC is at try_to_suspend+0x38/0xe0  
    pc : [<c016ad38>]
    0x38偏移量, 0xE0是try_to_suspend函数的总长度

    1087 [   82.566833] c0 37 (kworker/u8:1) Unable to handle kernel NULL pointer dereference at virtual address 00000000
    1088 [   82.577697] c0 37 (kworker/u8:1) pgd = c0104000
    1089 [   11.830322] c0 37 (kworker/u8:1) SEH:seh_api_ioctl_handler 6
    1090 
    1091 [   82.582458] c0 37 (kworker/u8:1) [00000000] *pgd=00000000
    1092 [   82.587860] c0 37 (kworker/u8:1)
    1093 [   82.589965] c0 37 (kworker/u8:1) Internal error: Oops: 805 [#1] PREEMPT SMP ARM

    1094 [   82.597259] c0 37 (kworker/u8:1) Modules linked in: audiostub cidatattydev gs_modem ccinetdev cci_datastub citt     y iml_module seh cploaddev msocketk geu galcore(O)                                                                
    1095 [   82.610107] c0 37 (kworker/u8:1) CPU: 0 PID: 37 Comm: kworker/u8:1 Tainted: G        W  O 3.10.33 #51

    1096 [   82.619354] c0 37 (kworker/u8:1) Workqueue: autosleep try_to_suspend
    1097 [   82.623901] c0 37 (kworker/u8:1) task: db34a640 ti: db390000 task.ti: db390000
    1098 [   82.631164] c0 37 (kworker/u8:1) PC is at try_to_suspend+0x38/0xe0
    1099 [   82.637359] c0 37 (kworker/u8:1) LR is at try_to_suspend+0x28/0xe0

    1100 [   82.643585] c0 37 (kworker/u8:1) pc : [<c016ad38>]    lr : [<c016ad28>]    psr: a00e0013
    1101                sp : db391ee8  ip : 00000000  fp : 00000000
    1102 [   82.656921] c0 37 (kworker/u8:1) r10: db2a5400  r9 : 00000000  r8 : db390000
    1103 [   82.664001] c0 37 (kworker/u8:1) r7 : db80ec00  r6 : c0ab3d34  r5 : c0a5f01c  r4 : c0a5f01c
    1104 [   82.672393] c0 37 (kworker/u8:1) r3 : 00000000  r2 : 00000006  r1 : 200e0013  r0 : c0a5f02c

    1105 [   82.680755] c0 37 (kworker/u8:1) Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel


    3、反汇编vmlinux
    arm-linux-androideabi-objdump -C -S vmlinux > vmlinux-dump 
    通过地址c016ad38能够查找到是运行以下这条指令出现了panic。从kmsg能够得知r3 : 00000000、r2 : 00000006,向地址0x0赋值肯定是非法的。
    272190 c016ad38:   15832000    strne   r2, [r3]

    运行*p = 6的条件是work->func != NULL,R0寄存器的值是try_to_suspend()函数的參数struct work_struct *。R0~R3为什么被用来装载函数參数,能够搜索下APCS标准。


    if (work->func != NULL) 
    *p = 6;
    运行 struct work_struct c0a5f02c 能够恢复当时的struct work_struct,能够清楚看到work->func并不为NULL。
    crash-arm> struct work_struct c0a5f02c
    struct work_struct {
      data = {

        counter = 0
      }, 
      entry = {
        next = 0x0, 
        prev = 0xc0a5f034 <autosleep_lock+8>
      }, 
      func = 0xc0a5f034 <autosleep_lock+8>
    }

    上面仅仅是给出一个简单的样例用作学习,实际调试过程中遇到的panic肯定不会像样例这么简单。



    參考:
    http://blog.csdn.net/keyboardota/article/details/6799054
    http://people.redhat.com/anderson/crash_whitepaper

  • 相关阅读:
    阿里HBase高可用8年“抗战”回忆录
    Service Mesh 初体验
    阿里云HBase推出普惠性高可用服务,独家支持用户的自建、混合云环境集群
    Ververica Platform-阿里巴巴全新Flink企业版揭秘
    深度 | 带领国产数据库走向世界,POLARDB底层逻辑是什么?
    AI加持的阿里云飞天大数据平台技术揭秘
    Nacos 常见问题及解决方法
    数据上云,应该选择全量抽取还是增量抽取?
    一文带你了解 Flink Forward 柏林站全部重点内容
    Oracle数据库中序列(SEQUENCE)的用法详解
  • 原文地址:https://www.cnblogs.com/yjbjingcha/p/6853199.html
Copyright © 2020-2023  润新知