缘来缘起
core的最原始含义是磁芯,是一种存储设备,dump的意思是倒出,那么core dump的含义就是:当进程发生异常时,会把当时的内存信息倾倒出来,形成core文件。
每个做linux C++开发的人,必然会遇到过core dump问题。在C++相关的面试中,core dump的调试,几乎是一个必考的考点,旨在检验应聘者的实战调试经验。
我知道的一个真实案例是:面试官让应聘者现场写出一个core dump程序,结果应聘者很懵圈,不知道怎么写。这说明,应聘者没有相关的调试经历,何谈通过面试?
接下来,我们以一个简单的core dump程序为例,来说说调试core dump的六种经验和方法,希望能对大家的开发实战有所帮助,顺便地,横扫那些简单的面试题。
本文示例的core dump程序如下:
#include <stdio.h>void swap(int *px, int *py){int tmp = *px;*px = *py;*py = tmp;}int main(){int a = 1;int b = 2;int c = a + b;printf("%d, %d, %d\n", a, b, c);swap(&a,& b);printf("%d, %d, %d\n", a, b, c);int *p = NULL;*p = 0;return 0;}
方法一: 代码review
代码review,是一种比较原始的笨方法。对于简单的代码而言,还可以进行review, 但是,一旦代码达到数万行,出现core dump后,便无从看起。所以,这种方法很鸡肋,几乎没什么用。
方法二: 打印log夹逼
打印log来夹逼,也是一种很简单的方法,在很多场景下,非常奏效。许多大学生和职场新手,容易出现core dump问题,那么, 我建议直接用log夹逼。有点类似二分查找,且看具体的姿势:
#include <stdio.h>void swap(int *px, int *py){int tmp = *px;*px = *py;*py = tmp;}int main(){ printf("xxx1\n");int a = 1; printf("xxx2\n");int b = 2; printf("xxx3\n");int c = a + b; printf("xxx4\n");printf("%d, %d, %d\n", a, b, c); printf("xxx5\n");swap(&a,& b); printf("xxx6\n");printf("%d, %d, %d\n", a, b, c); printf("xxx7\n");int *p = NULL; printf("xxx8\n");*p = 0; printf("xxx9\n");printf("xxx10\n");return 0;}
编译运行一下:
ubuntu@VM-0-15-ubuntu:~$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ ./a.outxxx1xxx2xxx3xxx41, 2, 3xxx5xxx62, 1, 3xxx7xxx8Segmentation fault (core dumped)ubuntu@VM-0-15-ubuntu:~$
很显然,有xxx8,但没有xxx9, 所以,必然是第21行出了问题。
方法三: dmesg + addr2line
有时候,如果core dump的开关没有打开,无法生成core文件,那怎么办呢?也是有办法的!用dmesg和addr2line吧。关于这两个命令的介绍,直接man一下即可。且看具体调试:
ubuntu@VM-0-15-ubuntu:~$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ ./a.outSegmentation fault (core dumped)ubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ dmesga.out[3709]: segfault at 0 ip 080483c9 sp bff75a60 error 6 in a.out[8048000+1000]ubuntu@VM-0-15-ubuntu:~$ addr2line -e a.out 080483c9/home/ubuntu/test.cpp:21
很显然,代码的第21行出了问题。
方法四: strace + addr2line
接下来,我们介绍一个重要的linux命令,即strace, 直接man一下就知道,它是用查看系统调用的,我们不过多赘述。来看具体的调试过程:
ubuntu@VM-0-15-ubuntu:~$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ strace -i ./a.out[00ff4424] execve("./a.out", ["./a.out"], [/* 22 vars */]) = 0[0086e2fd] brk(0) = 0x818e000[0086f6d3] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb771c000[0086f5d1] access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)[0086f494] open("/etc/ld.so.cache", O_RDONLY) = 3[0086f45e] fstat64(3, {st_mode=S_IFREG|0644, st_size=49072, ...}) = 0[0086f6d3] mmap2(NULL, 49072, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7710000[0086f4cd] close(3) = 0[0086f494] open("/lib/libc.so.6", O_RDONLY) = 3[0086f514] read(3, "\177ELF\1\1\1\3\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0 N\211\0004\0\0\0"..., 512) = 512[0086f45e] fstat64(3, {st_mode=S_IFREG|0755, st_size=1855584, ...}) = 0[0086f6d3] mmap2(0x87e000, 1620360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x87e000[0086f754] mprotect(0xa03000, 4096, PROT_NONE) = 0[0086f6d3] mmap2(0xa04000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x185) = 0xa04000[0086f6d3] mmap2(0xa07000, 10632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xa07000[0086f4cd] close(3) = 0[0086f6d3] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb770f000[0085a552] set_thread_area({entry_number:-1 -> 6, base_addr:0xb770f6c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0[0086f754] mprotect(0xa04000, 8192, PROT_READ) = 0[0086f754] mprotect(0x876000, 4096, PROT_READ) = 0[0086f711] munmap(0xb7710000, 49072) = 0[00ba1424] fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0[00ba1424] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb771b000[00ba1424] write(1, "1, 2, 3\n", 81, 2, 3) = 8[00ba1424] write(1, "2, 1, 3\n", 82, 1, 3) = 8[08048479] --- SIGSEGV (Segmentation fault) @ 0 (0) ---[????????] +++ killed by SIGSEGV (core dumped) +++Segmentation fault (core dumped)ubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ addr2line -e a.out 08048479/home/ubuntu/test.cpp:21
很显然,代码的第21行出了问题。
方法五: valgrind
之前,在调试内存泄漏时,介绍过valgrind,其实valgrind能查其他更多内存问题,非常强大。下面,我们来看看valgrind查core dump问题,如下:
ubuntu@VM-0-15-ubuntu:~$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ valgrind -v ./a.out==23889== Memcheck, a memory error detector==23889== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.==23889== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info==23889== Command: ./a.out......(部分非关键信息,我省略了哈)==23889== Invalid write of size 4==23889== at 0x4006D6: main (test.cpp:21)==23889== Address 0x0 is not stack'd, malloc'd or (recently) free'd==23889====23889====23889== Process terminating with default action of signal 11 (SIGSEGV)==23889== Access not within mapped region at address 0x0==23889== at 0x4006D6: main (test.cpp:21)==23889== If you believe this happened as a result of a stack==23889== overflow in your program's main thread (unlikely but==23889== possible), you can try to increase the size of the==23889== main thread stack using the --main-stacksize= flag.==23889== The main thread stack size used in this run was 8388608.--23889-- REDIR: 0x4ebe4f0 (libc.so.6:free) redirected to 0x4c2ed80 (free)==23889====23889== HEAP SUMMARY:==23889== in use at exit: 0 bytes in 0 blocks==23889== total heap usage: 1 allocs, 1 frees, 1,024 bytes allocated==23889====23889== All heap blocks were freed -- no leaks are possible==23889====23889== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)==23889====23889== 1 errors in context 1 of 1:==23889== Invalid write of size 4==23889== at 0x4006D6: main (test.cpp:21)==23889== Address 0x0 is not stack'd, malloc'd or (recently) free'd==23889====23889== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)Segmentation fault (core dumped)ubuntu@VM-0-15-ubuntu:~$
很显然,我们可以看到,第21行有问题,进程在21行core dump了。
方法六: gdb
gdb调试,是本文的重头戏,也几乎是笔试面试的必考内容。话不多说,直接来看姿势。使用gdb a.out core(不会重新拉取a.out进程)或者gdb a.out(会重新拉起a.out进程)都可以,如下:
ubuntu@VM-0-15-ubuntu:~$ g++ -g test.cppubuntu@VM-0-15-ubuntu:~$ubuntu@VM-0-15-ubuntu:~$ gdb a.outGNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1Copyright (C) 2016 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from a.out...done.(gdb) rStarting program: /home/ubuntu/a.out1, 2, 32, 1, 3Program received signal SIGSEGV, Segmentation fault.0x0000000000400646 in main () at test.cpp:2121 *p = 0;(gdb) bt#0 0x0000000000400646 in main () at test.cpp:21
显然,程序在第21行core dump了。gdb的调试,尤为重要,必须掌握。
最后的话
方法千万条,搞定问题第一条。在后续文章中,我们会更多地介绍各种调试方法和技巧,快速查杀bug, 这样大家就可以少加班啦。祝顺利。

本文详细介绍了在Linux环境下,针对C++程序coredump问题的六种调试方法,包括代码审查、打印log、dmesg+addr2line、strace+addr2line、valgrind和gdb调试。通过实例演示了每种方法的使用步骤和效果,强调了gdb调试的重要性,帮助开发者提升解决coredump问题的能力。
1万+

被折叠的 条评论
为什么被折叠?



