4.3. Explanation of error messages from Memcheck
Despite considerable sophistication under the hood, Memcheck can only really detect two kinds of errors: use of illegal addresses, and use of undefined values. Nevertheless, this is enough to help you discover all sorts of memory-management problems in your code.
[Valgrind Memcheck仅能检测两种错误,一是非法地址使用,二是未定义值使用。但这足以帮助你检查出代码中各种类型的内存管理问题。]
This section presents a quick summary of what error messages mean. The precise behaviour of the error-checking machinery is described in Details of Memcheck's checking machinery.
For example:
Invalid read of size 4
at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9)
by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9)
by 0x40B07FF4: read_png_image(QImageIO *) (kernel/qpngio.cpp:326)
by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621)
Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
This happens when your program reads or writes memory at a place which Memcheck reckons it shouldn't.
[上面的错误信息表示Memcheck认为你的程序错误的进行了对内存某个位置的读或写操作。]
In this example, the program did a 4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied library libpng.so.2.1.0.9, which was called from somewhere else in the same library, called from line 326 of qpngio.cpp
, and so on.
[具体信息表明,程序在 0xBFFFF0E0 处进行了4个字节的读操作,该操作发生在系统提供的 libpng.so.2.1.0.9 中,进而显示该库的调用由 qpngio.cpp 文件的 326 行处引起,等等类似描述。]
Memcheck tries to establish what the illegal address might relate to, since that's often useful. So, if it points into a block of memory which has already been freed, you'll be informed of this, and also where the block was free'd at. Likewise, if it should turn out to be just off the end of a malloc'd block, a common result of off-by-one-errors in array subscripting, you'll be informed of this fact, and also where the block was malloc'd.
[Memcheck尽可能地帮助用户指出错误地址与什么有关联,以提供有用信息。例如,如果程序使用了已经被释放的内存块,那么你除了能从Memcheck得知这一情况外,还能得到具体是哪一块内存被释放了。再比如,对于数组处理中通常会发生的越界现象,Memcheck 不仅会通知用户这一情况的出现,还会指出出错位置。]
In this example, Memcheck can't identify the address. Actually the address is on the stack, but, for some reason, this is not a valid stack address -- it is below the stack pointer and that isn't allowed. In this particular case it's probably caused by gcc generating invalid code, a known bug in some ancient versions of gcc.
[在上面的例子中, Memcheck不能确定具体地址。实际上,地址位于栈中,但是由于某些原因,它是一个无效的栈地址。]
Note that Memcheck only tells you that your program is about to access memory at an illegal address. It can't stop the access from happening. So, if your program makes an access which normally would result in a segmentation fault, you program will still suffer the same fate -- but you will get a message from Memcheck immediately prior to this. In this particular example, reading junk on the stack is non-fatal, and the program stays alive.
[注意:Memcheck 只能告诉你你的程序将访问非法地址内存,但它不能阻止访问。所以,如果你的程序访问中导致段错误,那程序将一直带有这个错误,但 Memcheck 会在此发生之前给你提示信息。在上面的例子中,对栈上的垃圾区域进行读操作没有风险,程序也不会崩溃。]
For example:
Conditional jump or move depends on uninitialised value(s)
at 0x402DFA94: _IO_vfprintf (_itoa.h:49)
by 0x402E8476: _IO_printf (printf.c:36)
by 0x8048472: main (tests/manuel1.c:8)
An uninitialised-value use error is reported when your program uses a value which hasn't been initialised -- in other words, is undefined. Here, the undefined value is used somewhere inside the printf() machinery of the C library. This error was reported when running the following small program:
[未初始值的使用错误,即未定义先使用。如下例:]
int main()
{
int x;
printf ("x = %d\n", x);
}
It is important to understand that your program can copy around junk (uninitialised) data as much as it likes.
[你的程序能够对垃圾(未初始)数据进行拷贝,理解这点很重要。]
Memcheck observes this and keeps track of the data, but does not complain. A complaint is issued only when your program attempts to make use of uninitialised data. In this example, x is uninitialised. Memcheck observes the value being passed to _IO_printf
and thence to _IO_vfprintf
, but makes no comment. However, _IO_vfprintf
has to examine the value of x so it can turn it into the corresponding ASCII string, and it is at this point that Memcheck complains.
[Memcheck 观察并跟踪数据,但它暂时不会报错。只有当你试图使用未初始的数据时才会报错。在上面的例子中,x 未被初始化,Memcheck 能够观察到该值传给 _IO_printf,然后传给 _IO_vfprintf,但此时不报错。而当 _IO_vfprintf 必须要检测 x 的值时,它需要将 x 值转换成相应的 ASCII 码字符串,此时 Memcheck就要报错了。]
Sources of uninitialised data tend to be:
[未初始值的来源包括:]
-
Local variables in procedures which have not been initialised, as in the example above.
- [本地变量在执行过程中未初始化,如上例。]
-
The contents of malloc'd blocks, before you write something there. In C++, the new operator is a wrapper round malloc, so if you create an object with new, its fields will be uninitialised until you (or the constructor) fill them in.
- [malloc分配的块内容未能初始化,特别是在写入数据之前。在C++中,new 操作符包含 malloc 操作,所以,如果你用 new 创建了一个对象,你必须为其指向的内存填充内容,或使用构造函数初始后,才算真正初始化。]
For example:
Invalid free()
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)
Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd
at 0x4004FFDF: free (vg_clientmalloc.c:577)
by 0x80484C7: main (tests/doublefree.c:10)
Memcheck keeps track of the blocks allocated by your program with malloc/new, so it can know exactly whether or not the argument to free/delete is legitimate or not. Here, this test program has freed the same block twice. As with the illegal read/write errors, Memcheck attempts to make sense of the address free'd. If, as here, the address is one which has previously been freed, you wil be told that -- making duplicate frees of the same block easy to spot.
[因为Memcheck 能够保持对 malloc/new 分配的块的跟踪,所以,它能够确切地知道 free/delete 是否合法。这里,可以看到上面测试程序对同一块释放了两次。与处理非法读写错误一样,Memcheck 试图使 free'd 出错地址看起来更有意义。见实例]
In the following example, a block allocated with new[]
has wrongly been deallocated with free
:
[下面的例子显示了使用free() 错误地释放了 new[] 分配的内存:]
Mismatched free() / delete / delete []
at 0x40043249: free (vg_clientfuncs.c:171)
by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149)
by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60)
by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44)
Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd
at 0x4004318C: operator new[](unsigned int) (vg_clientfuncs.c:152)
by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314)
by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416)
by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
In C++
it's important to deallocate memory in a way compatible with how it was allocated. The deal is:
[C++中,以正确的方式释放内存非常重要,具体如下:]
-
If allocated with
malloc
,calloc
,realloc
,valloc
ormemalign
, you must deallocate withfree
. - [如果内存使用 malloc\calloc\realloc\valloc\memalign 分配,你必须使用 free 释放。]
-
If allocated with
new[]
, you must deallocate withdelete[]
. - [如果使用 new[] 分配,你必须使用 delete[] 释放。]
-
If allocated with
new
, you must deallocate withdelete
. - [如果使用 new 分配,你必须使用 delete 释放。]
The worst thing is that on Linux apparently it doesn't matter if you do mix these up, but the same program may then crash on a different platform, Solaris for example. So it's best to fix it properly. According to the KDE folks "it's amazing how many C++ programmers don't know this".
[最糟糕的是,如果搞混,有时在 Linux 平台不会出错,但在其他平台,如 Solaris 却会使程序崩溃。所以,最好还是修改程序,恰当使用每种方式。也不怪 KDE 的开发者们在惊叹“怎么会有那么多程序员不知道这一点。”]
The reason behind the requirement is as follows. In some C++ implementations, delete[]
must be used for objects allocated by new[]
because the compiler stores the size of the array and the pointer-to-member to the destructor of the array's content just before the pointer actually returned. This implies a variable-sized overhead in what's returned by new
or new[]
.
Memcheck checks all parameters to system calls:
-
It checks all the direct parameters themselves.
- [检查直接参数]
-
Also, if a system call needs to read from a buffer provided by your program, Memcheck checks that the entire buffer is addressable and has valid data, ie, it is readable.
- [程序提供缓冲区参数]
-
Also, if the system call needs to write to a user-supplied buffer, Memcheck checks that the buffer is addressable.
- [用户提供缓冲区参数]
After the system call, Memcheck updates its tracked information to precisely reflect any changes in memory permissions caused by the system call.
Here's an example of two system calls with invalid parameters:
#include <stdlib.h>
#include <unistd.h>
int main( void )
{
char* arr = malloc(10);
int* arr2 = malloc(sizeof(int));
write( 1 /* stdout */, arr, 10 );
exit(arr2[0]);
}
You get these complaints ...
Syscall param write(buf) points to uninitialised byte(s)
at 0x25A48723: __write_nocancel (in /lib/tls/libc-2.3.3.so)
by 0x259AFAD3: __libc_start_main (in /lib/tls/libc-2.3.3.so)
by 0x8048348: (within /auto/homes/njn25/grind/head4/a.out)
Address 0x25AB8028 is 0 bytes inside a block of size 10 alloc'd
at 0x259852B0: malloc (vg_replace_malloc.c:130)
by 0x80483F1: main (a.c:5)
Syscall param exit(error_code) contains uninitialised byte(s)
at 0x25A21B44: __GI__exit (in /lib/tls/libc-2.3.3.so)
by 0x8048426: main (a.c:8)
... because the program has (a) tried to write uninitialised junk from the malloc'd block to the standard output, and (b) passed an uninitialised value to exit
. Note that the first error refers to the memory pointed to by buf
(not buf
itself), but the second error refers directly to exit
's argument arr2[0]
.
The following C library functions copy some data from one memory block to another (or something similar): memcpy()
, strcpy()
, strncpy()
, strcat()
, strncat()
. The blocks pointed to by their src
and dst
pointers aren't allowed to overlap. Memcheck checks for this.
[C库中一些具有拷贝功能的函数,如memcpy()
, strcpy()
, strncpy()
, strcat()
, strncat()
,他们将数据从一个内存块中拷贝至另一个中。源头和目标内存块不允许有重叠, Memcheck 可以检查这种错误。]
For example:
==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21)
==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71)
==27492== by 0x804865A: main (overlap.c:40)
You don't want the two blocks to overlap because one of them could get partially overwritten by the copying.
You might think that Memcheck is being overly pedantic reporting this in the case where dst
is less than src
. For example, the obvious way to implement memcpy()
is by copying from the first byte to the last. However, the optimisation guides of some architectures recommend copying from the last byte down to the first. Also, some implementations of memcpy()
zero dst
before copying, because zeroing the destination's cache line(s) can improve performance.
In addition, for many of these functions, the POSIX standards have wording along the lines "If copying takes place between objects that overlap, the behavior is undefined." Hence overlapping copies violate the standard.
The moral of the story is: if you want to write truly portable code, don't make any assumptions about the language implementation.
Memcheck keeps track of all memory blocks issued in response to calls to malloc/calloc/realloc/new. So when the program exits, it knows which blocks have not been freed.
[Memcheck 保持对所有 malloc/calloc/realloc/new 分配的内存块的跟踪,因此当程序退出时,它能检测出哪一块未被释放。]
If --leak-check
is set appropriately, for each remaining block, Memcheck scans the entire address space of the process, looking for pointers to the block. Each block fits into one of the three following categories.
[如果 --leak-check 被设置,则 Memcheck 将扫描整个空间地址,寻找指向块的指针。处理后,在 Memcheck 中可以将块分类三类:]
-
Still reachable: A pointer to the start of the block is found. This usually indicates programming sloppiness. Since the block is still pointed at, the programmer could, at least in principle, free it before program exit. Because these are very common and arguably not a problem, Memcheck won't report such blocks unless
--show-reachable=yes
is specified. - [仍可被访问的块:存在一个指针指向该块的起始位置。这通常表明编程水平太潮。因为程序退出后程序员至少要将内存释放掉,而不应再有指针指向它。Memcheck 通常不会报告此类问题,除非你使用了 --show-reachable=yes 选项,因为这些问题很常见且存有争议。]
-
Possibly lost, or "dubious": A pointer to the interior of the block is found. The pointer might originally have pointed to the start and have been moved along, or it might be entirely unrelated. Memcheck deems such a block as "dubious", because it's unclear whether or not a pointer to it still exists.
- [可能丢失或“有疑问的块”:指向块内部的指针被发现。该指针很可能开始时指向块首,后来被移动,或者最终没有任何关联。 Memcheck 认为这样的块为“有疑问的块”,因为不清楚是否还有指针指向它。]
-
Definitely lost, or "leaked": The worst outcome is that no pointer to the block can be found. The block is classified as "leaked", because the programmer could not possibly have freed it at program exit, since no pointer to it exists. This is likely a symptom of having lost the pointer at some earlier point in the program.
- [明显丢失或“内存泄漏”:最坏的结果是某些块已没有指针指向,却还能被发现。这样的块被归入“泄漏”类,是因为程序员在程序终止时没有释放,因为没有指针指向他们。这很象程序运行前期指针丢失的症状。]
For each block mentioned, Memcheck will also tell you where the block was allocated. It cannot tell you how or why the pointer to a leaked block has been lost; you have to work that out for yourself. In general, you should attempt to ensure your programs do not have any leaked or dubious blocks at exit.
[对以上每种情况,Memcheck 都会告诉你该块在何处被分配。但它不会告诉你如何和为什么指向该块的指针丢失。你必须自己进一步确定。总起来说,你应该首先确保程序在退出时不会有任何泄漏或有泄漏倾向的块存在。]
For example:
8 bytes in 1 blocks are definitely lost in loss record 1 of 14
at 0x........: malloc (vg_replace_malloc.c:...)
by 0x........: mk (leak-tree.c:11)
by 0x........: main (leak-tree.c:39)
88 (8 direct, 80 indirect) bytes in 1 blocks are definitely lost
in loss record 13 of 14
at 0x........: malloc (vg_replace_malloc.c:...)
by 0x........: mk (leak-tree.c:11)
by 0x........: main (leak-tree.c:25)
The first message describes a simple case of a single 8 byte block that has been definitely lost. The second case mentions both "direct" and "indirect" leaks. The distinction is that a direct leak is a block which has no pointers to it. An indirect leak is a block which is only pointed to by other leaked blocks. Both kinds of leak are bad.
[第一部分描述了有8字节块明确丢失,第二部分提到了“直接”和“间接”泄漏,其区别是直接泄漏为存在一个内存块,而没有指针指向它。间接泄漏则为存在一个块,而存有其他泄漏块指向该块。两种情况都不是好事。]
The precise area of memory in which Memcheck searches for pointers is: all naturally-aligned machine-word-sized words found in memory that Memcheck's records indicate is both accessible and initialised.
[以上提到的 Memcheck用于寻找指针的内存块的确切区域指的是:内存中所有能过找到的自然关联的、按机器字长度计算的 words, 以上所指内存是指 Memcheck 记录中表明可被访问和可被初始化的内存区域。]