• coredump配置、产生、分析以及分析示例


    关键词:coredump、core_pattern、coredump_filter等等。

    应用程序在运行过程中由于各种异常或者bug导致退出,在满足一定条件下产生一个core文件。

    通常core文件包含了程序运行时内存、寄存器状态、堆栈指针、内存管理信息以及函数调用堆栈信息。

    core就是程序当前工作转改存储生成的一个文件,通过工具分析这个文件,可以定位到程序异常退出的时候对应的堆栈调用等信息,找出问题点并解决。

    1. 配置coredump

    如果需要使用需要通过ulimit进行设置,可以通过ulimit -c查看当前系统是否支持coredump。如果为0,则表示coredump被关闭。

    通过ulimit -c unlimited可以打开coredump。

    coredump文件默认存储位置与可执行文件在同一目录下,文件名为core。

    可以通过/proc/sys/kernel/core_pattern进行设置。

    %p  出Core进程的PID
    %u  出Core进程的UID
    %s  造成Core的signal号
    %t  出Core的时间,从1970-01-0100:00:00开始的秒数
    %e  出Core进程对应的可执行文件名

    通过echo "core-%e-%p-%s-%t" > /proc/sys/kernel/core_pattern。

    在每个进程下都有coredump_filter节点/proc/<pid>/coredump_filter

    通过配置coredump_filter可以选择需在coredump的时候,将哪些内容dump到core文件中。

      - (bit 0) anonymous private memory
      - (bit 1) anonymous shared memory
      - (bit 2) file-backed private memory
      - (bit 3) file-backed shared memory
      - (bit 4) ELF header pages in file-backed private memory areas (it is effective only if the bit 2 is cleared)
      - (bit 5) hugetlb private memory
      - (bit 6) hugetlb shared memory
      - (bit 7) DAX private memory
      - (bit 8) DAX shared memory

    coredump_filter的默认值是0x33,也即发生coredump时会将所有anonymous内存、ELF头页面、hugetlb private memory内容保存。

    coredump_filter可以被子进程继承,可以echo 0xXX > /proc/self/coredump_filter设置当前进程的coredump_filter。

    static ssize_t proc_coredump_filter_write(struct file *file,
                          const char __user *buf,
                          size_t count,
                          loff_t *ppos)
    {
    ...
        ret = kstrtouint_from_user(buf, count, 0, &val);-------------------------将buf转换成val值。
        if (ret < 0)
            return ret;
    ...
        for (i = 0, mask = 1; i < MMF_DUMP_FILTER_BITS; i++, mask <<= 1) {
            if (val & mask)
                set_bit(i + MMF_DUMP_FILTER_SHIFT, &mm->flags);------------------将coredump_filter的值映射到mm->flags上,后续coredump时使用。
            else
                clear_bit(i + MMF_DUMP_FILTER_SHIFT, &mm->flags);
        }
    ...
    }

    其中MMF_DUMP_FILTER_SHIFT为2,所以flags和coredump_filter存在如下对应关系。

    #define MMF_DUMP_ANON_PRIVATE    2
    #define MMF_DUMP_ANON_SHARED    3
    #define MMF_DUMP_MAPPED_PRIVATE    4
    #define MMF_DUMP_MAPPED_SHARED    5
    #define MMF_DUMP_ELF_HEADERS    6
    #define MMF_DUMP_HUGETLB_PRIVATE 7
    #define MMF_DUMP_HUGETLB_SHARED  8
    #define MMF_DUMP_DAX_PRIVATE    9
    #define MMF_DUMP_DAX_SHARED    10

    2. coredump原理

    在do_signal()中根据信号判断是否触发coredump,当然还跟coredump limit、mm->flags等等相关。

    满足coredump条件后,由do_coredump()进行coredump文件生成,核心是由binfmt->core_dump()进行的。

    2.1 触发coredump的条件?

    在内核返回用户空间的时候,会调用do_signal()处理信号。

    static void do_signal(struct pt_regs *regs, int syscall)
    {
        unsigned int retval = 0, continue_addr = 0, restart_addr = 0;
        struct ksignal ksig;
    ...
        if (get_signal(&ksig)) {
    ...
        }
    ...
    }
    
    int get_signal(struct ksignal *ksig)
    {
    ...
        for (;;) {
            struct k_sigaction *ka;
    ...
            signr = dequeue_signal(current, &current->blocked, &ksig->info);
    ...
            /* Trace actually delivered signals. */
            trace_signal_deliver(signr, &ksig->info, ka);
    ...
            if (sig_kernel_coredump(signr)) {
                if (print_fatal_signals)------------------------------可以通过kernel.print-fatal-signals = 1进行设置,对应的节点是/proc/sys/kernel/print-fatal-signals。
                    print_fatal_signal(ksig->info.si_signo);----------打印当前信号及当前场景的栈信息。
                proc_coredump_connector(current);
                do_coredump(&ksig->info);
            }
    ...
        }
        spin_unlock_irq(&sighand->siglock);
    
        ksig->sig = signr;
        return ksig->sig > 0;
    }
    
    #define sig_kernel_coredump(sig)    siginmask(sig, SIG_KERNEL_COREDUMP_MASK)

      #define SIG_KERNEL_COREDUMP_MASK (
        rt_sigmask(SIGQUIT) | rt_sigmask(SIGILL) |
        rt_sigmask(SIGTRAP) | rt_sigmask(SIGABRT) |
        rt_sigmask(SIGFPE) | rt_sigmask(SIGSEGV) |
        rt_sigmask(SIGBUS) | rt_sigmask(SIGSYS) |
        rt_sigmask(SIGXCPU) | rt_sigmask(SIGXFSZ) |
        SIGEMT_MASK )

    在get_signal()中,判断信号是否会导致coredump。这些信号包括SIGQUIT、SIGILL、SIGTRAP、SIGABRT、SIGFPE、SIGSEGV、SIGBUS、SIGSYS、SIGXCPU、SIGXFSZ

    “终止w/core”表示在进程当前工作目录的core文件中复制了该进程的存储图像(该文件名为core,由此可以看出这种功能很久之前就是UNIX功能的一部分)。

    void proc_coredump_connector(struct task_struct *task)
    {
        struct cn_msg *msg;
        struct proc_event *ev;
        __u8 buffer[CN_PROC_MSG_SIZE] __aligned(8);
    
        if (atomic_read(&proc_event_num_listeners) < 1)
            return;
    
        msg = buffer_to_cn_msg(buffer);
        ev = (struct proc_event *)msg->data;
        memset(&ev->event_data, 0, sizeof(ev->event_data));
        ev->timestamp_ns = ktime_get_ns();
        ev->what = PROC_EVENT_COREDUMP;
        ev->event_data.coredump.process_pid = task->pid;
        ev->event_data.coredump.process_tgid = task->tgid;
    
        memcpy(&msg->id, &cn_proc_event_id, sizeof(msg->id));
        msg->ack = 0; /* not used */
        msg->len = sizeof(*ev);
        msg->flags = 0; /* not used */
        send_msg(msg);
    } 

    2.2 coredump如何生成?

    void do_coredump(const siginfo_t *siginfo)
    {
        struct core_state core_state;
        struct core_name cn;
        struct mm_struct *mm = current->mm;
        struct linux_binfmt * binfmt;
        const struct cred *old_cred;
        struct cred *cred;
        int retval = 0;
        int ispipe;
        struct files_struct *displaced;
        /* require nonrelative corefile path and be extra careful */
        bool need_suid_safe = false;
        bool core_dumped = false;
        static atomic_t core_dump_count = ATOMIC_INIT(0);
        struct coredump_params cprm = {
            .siginfo = siginfo,
            .regs = signal_pt_regs(),
            .limit = rlimit(RLIMIT_CORE),-----------------------------------获取系统对于coredump的限制。
            /*
             * We must use the same mm->flags while dumping core to avoid
             * inconsistency of bit flags, since this flag is not protected
             * by any locks.
             */
            .mm_flags = mm->flags,
        };
    
        audit_core_dumps(siginfo->si_signo);
    
        binfmt = mm->binfmt;------------------------------------------------获取当前进程所使用的程序加载器。
        if (!binfmt || !binfmt->core_dump)
            goto fail;
        if (!__get_dumpable(cprm.mm_flags))---------------------------------从当前进程的mm->flags中取低两位判断是否可以coredump,SUID_DUMP_DISABLE(0)不可以,其他情况都可以。
            goto fail;
    
        cred = prepare_creds();
        if (!cred)
            goto fail;
        /*
         * We cannot trust fsuid as being the "true" uid of the process
         * nor do we know its entire history. We only know it was tainted
         * so we dump it as root in mode 2, and only into a controlled
         * environment (pipe handler or fully qualified path).
         */
        if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) {--------------区分SUID_DUMP_USER和SUID_DUMP_ROOT。
            /* Setuid core dump mode */
            cred->fsuid = GLOBAL_ROOT_UID;    /* Dump root private */
            need_suid_safe = true;
        }
    
        retval = coredump_wait(siginfo->si_signo, &core_state);
        if (retval < 0)
            goto fail_creds;
    
        old_cred = override_creds(cred);
    
        ispipe = format_corename(&cn, &cprm);-------------------------------根据core_pattern判断是否是ispipe,然后根据core_pattern的设置生成coredump文件名称。
    
        if (ispipe) {-------------------------------------------------------通过管道处理coredump信息。
            int dump_count;
            char **helper_argv;
            struct subprocess_info *sub_info;
    
            if (ispipe < 0) {
                printk(KERN_WARNING "format_corename failed
    ");
                printk(KERN_WARNING "Aborting core
    ");
                goto fail_unlock;
            }
    
            if (cprm.limit == 1) {
                printk(KERN_WARNING
                    "Process %d(%s) has RLIMIT_CORE set to 1
    ",
                    task_tgid_vnr(current), current->comm);
                printk(KERN_WARNING "Aborting core
    ");
                goto fail_unlock;
            }
            cprm.limit = RLIM_INFINITY;
    
            dump_count = atomic_inc_return(&core_dump_count);
            if (core_pipe_limit && (core_pipe_limit < dump_count)) {
                printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit
    ",
                       task_tgid_vnr(current), current->comm);
                printk(KERN_WARNING "Skipping core dump
    ");
                goto fail_dropcount;
            }
    
            helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL);----------将cn.corename参数进行拆分。
            if (!helper_argv) {
                printk(KERN_WARNING "%s failed to allocate memory
    ",
                       __func__);
                goto fail_dropcount;
            }
    
            retval = -ENOMEM;
            sub_info = call_usermodehelper_setup(helper_argv[0],
                            helper_argv, NULL, GFP_KERNEL,
                            umh_pipe_setup, NULL, &cprm);---------------------通过usermodehelper调用用户空间的helper_argv[0]程序进行core_pattern。
            if (sub_info)
                retval = call_usermodehelper_exec(sub_info,
                                  UMH_WAIT_EXEC);-----------------------------UMH_WAIT_EXEC表示在内核exec用户空间程序之后就退出,此时用户空间程序就通过pipe等待接收数据。
    
            argv_free(helper_argv);
            if (retval) {
                printk(KERN_INFO "Core dump to |%s pipe failed
    ",
                       cn.corename);
                goto close_fail;
            }
        } else {
            struct inode *inode;
            int open_flags = O_CREAT | O_RDWR | O_NOFOLLOW |
                     O_LARGEFILE | O_EXCL;
    
            if (cprm.limit < binfmt->min_coredump)
                goto fail_unlock;
    
            if (need_suid_safe && cn.corename[0] != '/') {
                printk(KERN_WARNING "Pid %d(%s) can only dump core "
                    "to fully qualified path!
    ",
                    task_tgid_vnr(current), current->comm);
                printk(KERN_WARNING "Skipping core dump
    ");
                goto fail_unlock;
            }
    
            if (!need_suid_safe) {
                mm_segment_t old_fs;
    
                old_fs = get_fs();
                set_fs(KERNEL_DS);
                /*
                 * If it doesn't exist, that's fine. If there's some
                 * other problem, we'll catch it at the filp_open().
                 */
                (void) sys_unlink((const char __user *)cn.corename);
                set_fs(old_fs);
            }
    
            if (need_suid_safe) {---------------------------------------------创建coredump文件。
                struct path root;
    
                task_lock(&init_task);
                get_fs_root(init_task.fs, &root);
                task_unlock(&init_task);
                cprm.file = file_open_root(root.dentry, root.mnt,
                    cn.corename, open_flags, 0600);
                path_put(&root);
            } else {
                cprm.file = filp_open(cn.corename, open_flags, 0600);
            }
            if (IS_ERR(cprm.file))
                goto fail_unlock;
    
            inode = file_inode(cprm.file);
            if (inode->i_nlink > 1)------------------------------------------coredummp文件不能有多个硬链接。
                goto close_fail;
            if (d_unhashed(cprm.file->f_path.dentry))
                goto close_fail;
    
            if (!S_ISREG(inode->i_mode))--------------------------------------coredump文件必须为普通文件。
                goto close_fail;
    
            if (!uid_eq(inode->i_uid, current_fsuid()))
                goto close_fail;
            if ((inode->i_mode & 0677) != 0600)
                goto close_fail;
            if (!(cprm.file->f_mode & FMODE_CAN_WRITE))-----------------------coredump文件必须可写。
                goto close_fail;
            if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
                goto close_fail;
        }
    
        /* get us an unshared descriptor table; almost always a no-op */
        retval = unshare_files(&displaced);
        if (retval)
            goto close_fail;
        if (displaced)
            put_files_struct(displaced);
        if (!dump_interrupted()) {
            file_start_write(cprm.file);
            core_dumped = binfmt->core_dump(&cprm);---------------------------调用对应程序加载器的core_dump进行处理,将数据写入到cprm.file中。
            file_end_write(cprm.file);
        }
        if (ispipe && core_pipe_limit)
            wait_for_dump_helpers(cprm.file);
    close_fail:
        if (cprm.file)
            filp_close(cprm.file, NULL);
    fail_dropcount:
        if (ispipe)
            atomic_dec(&core_dump_count);
    fail_unlock:
        kfree(cn.corename);
        coredump_finish(mm, core_dumped);
        revert_creds(old_cred);
    fail_creds:
        put_cred(cred);
    fail:
        return;
    }

    format_corename()根据core_pattern中的设置,生成coredump文件名。并且判断coredump文件生成方式,ispipe为真则通过管道传输给其他应用处理;否则直接保存成文件。

    static int format_corename(struct core_name *cn, struct coredump_params *cprm)
    {
        const struct cred *cred = current_cred();
        const char *pat_ptr = core_pattern;
        int ispipe = (*pat_ptr == '|');------------------------------------------|表示通过pipe处理coredump文件。
        int pid_in_pattern = 0;
        int err = 0;
    
        cn->used = 0;
        cn->corename = NULL;
        if (expand_corename(cn, core_name_size))
            return -ENOMEM;
        cn->corename[0] = '';
    
        if (ispipe)
            ++pat_ptr;
    
        /* Repeat as long as we have more pattern to process and more output
           space */
        while (*pat_ptr) {
            if (*pat_ptr != '%') {
                err = cn_printf(cn, "%c", *pat_ptr++);
            } else {
                switch (*++pat_ptr) {
                /* single % at the end, drop that */
                case 0:
                    goto out;
                /* Double percent, output one percent */
                case '%':
                    err = cn_printf(cn, "%c", '%');
                    break;
                /* pid */
                case 'p':
                    pid_in_pattern = 1;
                    err = cn_printf(cn, "%d",
                              task_tgid_vnr(current));-------------------------%p表示记录当前进程组的pid。
                    break;
                /* global pid */
                case 'P':-------------------------------------------------------%P表示记录当前进程组的pid。
                    err = cn_printf(cn, "%d",
                              task_tgid_nr(current));
                    break;
                case 'i':
                    err = cn_printf(cn, "%d",
                              task_pid_vnr(current));--------------------------%i表示记录当前线程的pid。
                    break;
                case 'I':------------------------------------------------------%I表示记录当前线程的pid。
                    err = cn_printf(cn, "%d",
                              task_pid_nr(current));
                    break;
                /* uid */
                case 'u':-------------------------------------------------------%u表示当前用户id。
                    err = cn_printf(cn, "%u",
                            from_kuid(&init_user_ns,
                                  cred->uid));
                    break;
                /* gid */
                case 'g':-------------------------------------------------------%g表示group id。
                    err = cn_printf(cn, "%u",
                            from_kgid(&init_user_ns,
                                  cred->gid));
                    break;
                case 'd':
                    err = cn_printf(cn, "%d",
                        __get_dumpable(cprm->mm_flags));------------------------%d表示dump的用户类型:SUID_DUMP_DISABLE/SUID_DUMP_USER/SUID_DUMP_ROOT。
                    break;
                /* signal that caused the coredump */
                case 's':
                    err = cn_printf(cn, "%d",
                            cprm->siginfo->si_signo);----------------------------%s记录产生coredump的信号。
                    break;
                /* UNIX time of coredump */
                case 't': {
                    time64_t time;
    
                    time = ktime_get_real_seconds();
                    err = cn_printf(cn, "%lld", time);---------------------------%t记录产生coredump的时间。
                    break;
                }
                /* hostname */
                case 'h':--------------------------------------------------------%h记录主机名。
                    down_read(&uts_sem);
                    err = cn_esc_printf(cn, "%s",
                              utsname()->nodename);
                    up_read(&uts_sem);
                    break;
                /* executable */
                case 'e':
                    err = cn_esc_printf(cn, "%s", current->comm);----------------%e记录进程中comm名称。
                    break;
                case 'E':
                    err = cn_print_exe_file(cn);---------------------------------%E记录可执行文件名称。
                    break;
                /* core limit size */
                case 'c':
                    err = cn_printf(cn, "%lu",
                              rlimit(RLIMIT_CORE));------------------------------%c记录coredump的limit值。
                    break;
                default:
                    break;
                }
                ++pat_ptr;
            }
    
            if (err)
                return err;
        }
    
    out:
        if (!ispipe && !pid_in_pattern && core_uses_pid) {
            err = cn_printf(cn, ".%d", task_tgid_vnr(current));
            if (err)
                return err;
        }
        return ispipe;
    }

    所以core_%e(%I)_%E(%p)_sig(%s)_time(%t)写入到core_pattern表示core_线程名(线程pid)_进程名(进程pid)_sig(信号值)_time(异常时间点)

    umh_pipe_setup()创建了一个管道,这个管道给内核coredump和用户空间程序搭建了一个桥梁。

    内核coredump的数据写入管道,用户空间程序在管道另一端接收进行处理。

    static int umh_pipe_setup(struct subprocess_info *info, struct cred *new)
    {
        struct file *files[2];
        struct coredump_params *cp = (struct coredump_params *)info->data;
        int err = create_pipe_files(files, 0);----------------------------创建一个pipe管道,files[0]是管道的读端;files[1]是管道的写端。
        if (err)
            return err;
    
        cp->file = files[1];----------------------------------------------cp->file指向管道的写端,后面coredump写入这里。
    
        err = replace_fd(0, files[0], 0);---------------------------------这里将files[0]作为usermodehelper执行程序的输入,coredump的数据通过管道给用户空间程序接收。
        fput(files[0]);
        /* and disallow core files too */
        current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1};
    
        return err;
    }
    
    int create_pipe_files(struct file **res, int flags)
    {
        int err;
        struct inode *inode = get_pipe_inode();
        struct file *f;
        struct path path;
        static struct qstr name = { .name = "" };
    
        if (!inode)
            return -ENFILE;
    
        err = -ENOMEM;
        path.dentry = d_alloc_pseudo(pipe_mnt->mnt_sb, &name);
        if (!path.dentry)
            goto err_inode;
        path.mnt = mntget(pipe_mnt);
    
        d_instantiate(path.dentry, inode);
    
        f = alloc_file(&path, FMODE_WRITE, &pipefifo_fops);------------------------创建管道的写一端。
        if (IS_ERR(f)) {
            err = PTR_ERR(f);
            goto err_dentry;
        }
    
        f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
        f->private_data = inode->i_pipe;
    
        res[0] = alloc_file(&path, FMODE_READ, &pipefifo_fops);--------------------创建管道的读一端。
        if (IS_ERR(res[0])) {
            err = PTR_ERR(res[0]);
            goto err_file;
        }
    
        path_get(&path);
        res[0]->private_data = inode->i_pipe;
        res[0]->f_flags = O_RDONLY | (flags & O_NONBLOCK);
        res[1] = f;
        return 0;
    
    err_file:
        put_filp(f);
    err_dentry:
        free_pipe_info(inode->i_pipe);
        path_put(&path);
        return err;
    
    err_inode:
        free_pipe_info(inode->i_pipe);
        iput(inode);
        return err;
    }
    
    int replace_fd(unsigned fd, struct file *file, unsigned flags)
    {
        int err;
        struct files_struct *files = current->files;
    
        if (!file)
            return __close_fd(files, fd);
    
        if (fd >= rlimit(RLIMIT_NOFILE))
            return -EBADF;
    
        spin_lock(&files->file_lock);
        err = expand_files(files, fd);
        if (unlikely(err < 0))
            goto out_unlock;
        return do_dup2(files, file, fd, flags);
    
    out_unlock:
        spin_unlock(&files->file_lock);
        return err;
    }

    linux内核支持多种linux_binfmt,这里最常用的是ELF。 

    所以do_coredump()中的binfmt即为elf_format,binfmt->core_dump()即为elf_coredump()。

    elf_core_dump()将当前进程的vma区域进行dummp,附加相关的头信息等。保存成文件。

    static struct linux_binfmt elf_format = {
        .module        = THIS_MODULE,
        .load_binary    = load_elf_binary,
        .load_shlib    = load_elf_library,
        .core_dump    = elf_core_dump,
        .min_coredump    = ELF_EXEC_PAGESIZE,
    };
    
    static int elf_core_dump(struct coredump_params *cprm)
    {
        int has_dumped = 0;
        mm_segment_t fs;
        int segs, i;
        size_t vma_data_size = 0;
        struct vm_area_struct *vma, *gate_vma;
        struct elfhdr *elf = NULL;
        loff_t offset = 0, dataoff;
        struct elf_note_info info = { };
        struct elf_phdr *phdr4note = NULL;
        struct elf_shdr *shdr4extnum = NULL;
        Elf_Half e_phnum;
        elf_addr_t e_shoff;
        elf_addr_t *vma_filesz = NULL;
    
        elf = kmalloc(sizeof(*elf), GFP_KERNEL);-----------------------申请存放elfhdr空间。
        if (!elf)
            goto out;
    
        segs = current->mm->map_count;---------------------------------通过current->mm->map_count得到当前进程已映射的内存段数量。
        segs += elf_core_extra_phdrs();--------------------------------增加附加段数量。
    
        gate_vma = get_gate_vma(current->mm);--------------------------增加一个segment给vma使用。
        if (gate_vma != NULL)
            segs++;
    
        /* for notes section */
        segs++;--------------------------------------------------------保留一个segment给PT_NOTE使用。
    
        /* If segs > PN_XNUM(0xffff), then e_phnum overflows. To avoid
         * this, kernel supports extended numbering. Have a look at
         * include/linux/elf.h for further information. */
        e_phnum = segs > PN_XNUM ? PN_XNUM : segs;
    
        /*
         * Collect all the non-memory information about the process for the
         * notes.  This also sets up the file header.
         */
        if (!fill_note_info(elf, e_phnum, &info, cprm->siginfo, cprm->regs))-----fill_note_info()填充info信息。
            goto cleanup;
    
        has_dumped = 1;
    
        fs = get_fs();
        set_fs(KERNEL_DS);------------------------------------------------------在内核中操作用户空间文件,需要将地址方位扩大。具体参见《Linux内核访问用户空间文件:get_fs()/set_fs()的使用
    
        offset += sizeof(*elf);                /* Elf header */
        offset += segs * sizeof(struct elf_phdr);    /* Program headers */
    
        /* Write notes phdr entry */
        {
            size_t sz = get_note_info_size(&info);
    
            sz += elf_coredump_extra_notes_size();
    
            phdr4note = kmalloc(sizeof(*phdr4note), GFP_KERNEL);
            if (!phdr4note)
                goto end_coredump;
    
            fill_elf_note_phdr(phdr4note, sz, offset);
            offset += sz;
        }
    
        dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
    
        vma_filesz = kmalloc_array(segs - 1, sizeof(*vma_filesz), GFP_KERNEL);
        if (!vma_filesz)
            goto end_coredump;
    
        for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
                vma = next_vma(vma, gate_vma)) {
            unsigned long dump_size;
    
            dump_size = vma_dump_size(vma, cprm->mm_flags);----------------------mm_flags对应coredump_filter,用于确定哪些vma需要dump,哪些忽略掉。
            vma_filesz[i++] = dump_size;
            vma_data_size += dump_size;
        }
    
        offset += vma_data_size;
        offset += elf_core_extra_data_size();
        e_shoff = offset;
    
        if (e_phnum == PN_XNUM) {
            shdr4extnum = kmalloc(sizeof(*shdr4extnum), GFP_KERNEL);
            if (!shdr4extnum)
                goto end_coredump;
            fill_extnum_info(elf, shdr4extnum, e_shoff, segs);
        }
    
        offset = dataoff;
    
        if (!dump_emit(cprm, elf, sizeof(*elf)))---------------------------写入elf头到cprm->file文件,在使用pipe的情况下,这些数据都交给usermodehelper启动的用户空间程序进行处理。
            goto end_coredump;
    
        if (!dump_emit(cprm, phdr4note, sizeof(*phdr4note)))---------------写入phdr4node到cprm->file文件。
            goto end_coredump;
    
        /* Write program headers for segments dump */
        for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
                vma = next_vma(vma, gate_vma)) {
            struct elf_phdr phdr;
    
            phdr.p_type = PT_LOAD;
            phdr.p_offset = offset;
            phdr.p_vaddr = vma->vm_start;
            phdr.p_paddr = 0;
            phdr.p_filesz = vma_filesz[i++];
            phdr.p_memsz = vma->vm_end - vma->vm_start;
            offset += phdr.p_filesz;
            phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
            if (vma->vm_flags & VM_WRITE)
                phdr.p_flags |= PF_W;
            if (vma->vm_flags & VM_EXEC)
                phdr.p_flags |= PF_X;
            phdr.p_align = ELF_EXEC_PAGESIZE;
    
            if (!dump_emit(cprm, &phdr, sizeof(phdr)))
                goto end_coredump;
        }
    
        if (!elf_core_write_extra_phdrs(cprm, offset))
            goto end_coredump;
    
         /* write out the notes section */
        if (!write_note_info(&info, cprm))
            goto end_coredump;
    
        if (elf_coredump_extra_notes_write(cprm))
            goto end_coredump;
    
        /* Align to page */
        if (!dump_skip(cprm, dataoff - cprm->pos))
            goto end_coredump;
    
        for (i = 0, vma = first_vma(current, gate_vma); vma != NULL;
                vma = next_vma(vma, gate_vma)) {
            unsigned long addr;
            unsigned long end;
    
            end = vma->vm_start + vma_filesz[i++];
    
            for (addr = vma->vm_start; addr < end; addr += PAGE_SIZE) {
                struct page *page;
                int stop;
    
                page = get_dump_page(addr);
                if (page) {
                    void *kaddr = kmap(page);
                    stop = !dump_emit(cprm, kaddr, PAGE_SIZE);
                    kunmap(page);
                    put_page(page);
                } else
                    stop = !dump_skip(cprm, PAGE_SIZE);
                if (stop)
                    goto end_coredump;
            }
        }
        dump_truncate(cprm);
    
        if (!elf_core_write_extra_data(cprm))
            goto end_coredump;
    
        if (e_phnum == PN_XNUM) {
            if (!dump_emit(cprm, shdr4extnum, sizeof(*shdr4extnum)))
                goto end_coredump;
        }
    
    end_coredump:
        set_fs(fs);
    
    cleanup:
        free_note_info(&info);
        kfree(shdr4extnum);
        kfree(vma_filesz);
        kfree(phdr4note);
        kfree(elf);
    out:
        return has_dumped;
    }
    
    int dump_emit(struct coredump_params *cprm, const void *addr, int nr)
    {
        struct file *file = cprm->file;
        loff_t pos = file->f_pos;
        ssize_t n;
        if (cprm->written + nr > cprm->limit)
            return 0;
        while (nr) {
            if (dump_interrupted())
                return 0;
            n = __kernel_write(file, addr, nr, &pos);
            if (n <= 0)
                return 0;
            file->f_pos = pos;
            cprm->written += n;
            cprm->pos += n;
            nr -= n;
        }
        return 1;
    }

    判断一个文件是否是coredump文件,可以通过readelf命令,如果类型是CORE(Core file)。

    或者通过file命令进行判断。

    参考文档:《Core file 文件格式(Linux Coredump文件结构)》,GDB解析coredump文件参考《GDB如何从Coredump文件恢复动态库信息》。

    3. coredump案例

    下面创建一个简单产生coredump的示例,然后通过gdb进行分析。

    3.1 coredump示例

    #include <stddef.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    int myfunc(int i) {
        *(int*)(NULL) = i; /* line 7 */
        return i - 1;
    }
    
    int main(int argc, char **argv) {
        /* Setup some memory. */
        char data_ptr[] = "string in data segment";
        char *mmap_ptr;
        char *text_ptr = "string in text segment";
        (void)argv;
        mmap_ptr = (char *)malloc(sizeof(data_ptr) + 1);
        strcpy(mmap_ptr, data_ptr);
        mmap_ptr[10] = 'm';
        mmap_ptr[11] = 'm';
        mmap_ptr[12] = 'a';
        mmap_ptr[13] = 'p';
        printf("text addr: %p
    ", text_ptr);
        printf("data addr: %p
    ", data_ptr);
        printf("mmap addr: %p
    ", mmap_ptr);
    
        /* Call a function to prepare a stack trace. */
        return myfunc(argc);
    }

    使用如下命令编译,-ggdb3表示产生更多适合GDB的调试信息,3是最高等级。

    gcc -ggdb3 -std=c99 -Wall -Wextra -pedantic -o main.out main.c 

    3.2 coredump+gdb分析

    通过ulimit -c unlimited打开coredump功能,执行./main.out产生core文件。

    text addr: 0x4007d4
    data addr: 0x7ffff28fdc30
    mmap addr: 0x10bb010
    Segmentation fault (core dumped)

    通过gdb ./main.out core,显示了进程由于什么信号导致的coredump(SIGSEGV)?在哪个文件(main.cc)?在哪个函数(myfunc())?具体位置的代码?等等信息。

    GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1...
    Reading symbols from ./main.out...done.
    [New LWP 8651]
    Core was generated by `./main.out'.
    Program terminated with signal SIGSEGV, Segmentation fault.
    #0  0x0000000000400635 in myfunc (i=1) at main.c:7
    7        *(int*)(NULL) = i; /* line 7 */

    关于core+gdb更详细的分析方法可以参考《通过core+gdb离线分析》,在分析过程中需要加载动态库可以参考《GDB动态库搜索路径》。

    4. coredump使用优化(适用嵌入式)

    在/etc/profile中,打开对coredump的配置以及对core_pattern进行配置:

    sysctl -p -q -e
    ulimit -c unlimited

    配置/etc/sysctl.conf文件:

    kernel.core_pattern=|/usr/bin/coredump_helper.sh core_%e_%I_%p_sig_%s_time_%t.gz
    kernel.core_uses_pid=1

    增加处理coredump文件的脚本:

    #!/bin/sh
    
    if [ ! -d "/var/coredump" ];then
        mkdir -p /var/coredump
    fi
    gzip > "/var/coredump/$1"

    最终在/var/coredump目录下生成core_<线程名>_<线程ID>_<进程ID>_sig_<信号值>_time_<coredump时间>.gz文件。

    5. 小结

    至此大概总结了,对coredump的设置(ulimit/core_pattern/coredump_filter)?触发coredump的条件(SIG_KERNEL_COREDUMP_MASK )?coredump生成core文件流程(do_coredump())?gdb如何识别core文件(GDB如何从Coredump文件恢复动态库信息)?如何通过gdb分析core文件发现问题(gdb->backtrace)?

  • 相关阅读:
    JVM 常量池、运行时常量池、字符串常量池
    JVM Direct Memory
    JVM 方法区
    JVM GC Roots
    jvm 堆
    jvm slot复用
    JVM 虚拟机栈
    JVM 程序计数器
    java打印树形目录结构
    java 通过反射获取数组
  • 原文地址:https://www.cnblogs.com/arnoldlu/p/11160510.html
Copyright © 2020-2023  润新知