1.Introduction
The Valgrind tool suite provides a number of debugging and profiling tools that help you make your programs faster and more correct. The most popular of these tools is called Memcheck. It can detect many memory-related errors that are common in C and C++ programs and that can lead to crashes and unpredictable behaviour.
The rest of this guide gives the minimum information you need to start detecting memory errors in your program with Memcheck. For full documentation of Memcheck and the other tools, please read the User Manual.
Compile your program with -g
to include debugging information so that Memcheck's error messages include exact line numbers. Using -O0
is also a good idea, if you can tolerate the slowdown. With -O1
line numbers in error messages can be inaccurate, although generally speaking running Memcheck on code compiled at -O1
works fairly well, and the speed improvement compared to running -O0
is quite significant. Use of -O2
and above is not recommended as Memcheck occasionally reports uninitialised-value errors which don't really exist.
If you normally run your program like this:
myprog arg1 arg2
Use this command line:
valgrind --leak-check=yes myprog arg1 arg2
Memcheck is the default tool. The --leak-check
option turns on the detailed memory leak detector.
Your program will run much slower (eg. 20 to 30 times) than normal, and use a lot more memory. Memcheck will issue messages about memory errors and leaks that it detects.
4. Memcheck can detect:
- Use of uninitialised memory
- Reading/writing memory after it has been free'd
- Reading/writing off the end of malloc'd blocks
- Reading/writing inappropriate areas on the stack
- Memory leaks -- where pointers to malloc'd blocks are lost forever
- Mismatched use of malloc/new/new [] vs free/delete/delete []
- Overlapping src and dst pointers in memcpy() and related functions
- Some misuses of the POSIX pthreads API
This outputs a report to the terminal like
==9704== Memcheck, a memory error detector for x86-linux. ==9704== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al. ==9704== Using valgrind-2.2.0, a program supervision framework for x86-linux. ==9704== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. ==9704== For more details, rerun with: -v ==9704== ==9704== ==9704== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 11 from 1) ==9704== malloc/free: in use at exit: 35 bytes in 2 blocks. ==9704== malloc/free: 3 allocs, 1 frees, 47 bytes allocated. ==9704== For counts of detected errors, rerun with: -v ==9704== searching for pointers to 2 not-freed blocks. ==9704== checked 1420940 bytes. ==9704== ==9704== 16 bytes in 1 blocks are definitely lost in loss record 1 of 2 ==9704== at 0x1B903D38: malloc (vg_replace_malloc.c:131) ==9704== by 0x80483BF: main (test.c:15) ==9704== ==9704== ==9704== 19 bytes in 1 blocks are definitely lost in loss record 2 of 2 ==9704== at 0x1B903D38: malloc (vg_replace_malloc.c:131) ==9704== by 0x8048391: main (test.c:8) ==9704== ==9704== LEAK SUMMARY: ==9704== definitely lost: 35 bytes in 2 blocks. ==9704== possibly lost: 0 bytes in 0 blocks. ==9704== still reachable: 0 bytes in 0 blocks. ==9704== suppressed: 0 bytes in 0 blocks.
Let's look at the code to see what happened. Allocation #1 (19 byte leak) is lost because p is pointed elsewhere before the memory from Allocation #1 is free'd. To help us track it down, Valgrind gives us a stack trace showing where the bytes were allocated. In the 19 byte leak entry, the bytes were allocate in test.c, line 8. Allocation #2 (12 byte leak) doesn't show up in the list because it is free'd. Allocation #3 shows up in the list even though there is still a reference to it (p) at program termination. This is still a memory leak! Again, Valgrind tells us where to look for the allocation (test.c line 15).
5.explanation of the various error messages
Despite considerable sophistication under the hood, Memcheck can only really detect two kinds of errors, use of illegal addresses, and use of undefined values. Nevertheless, this is enough to help you discover all sorts of memory-management nasties in your code. This section presents a quick summary of what error messages mean.
5.1 Illegal read / Illegal write errors
For example:
Invalid read of size 4 at 0x40F6BBCC: (within /usr/lib/libpng.so.2.1.0.9) by 0x40F6B804: (within /usr/lib/libpng.so.2.1.0.9) by 0x40B07FF4: read_png_image__FP8QImageIO (kernel/qpngio.cpp:326) by 0x40AC751B: QImageIO::read() (kernel/qimage.cpp:3621) Address 0xBFFFF0E0 is not stack'd, malloc'd or free'd
This happens when your program reads or writes memory at a place which Memcheck reckons it shouldn't. In this example, the program did a 4-byte read at address 0xBFFFF0E0, somewhere within the system-supplied library libpng.so.2.1.0.9, which was called from somewhere else in the same library, called from line 326 of qpngio.cpp, and so on.
Memcheck tries to establish what the illegal address might relate to, since that's often useful. So, if it points into a block of memory which has already been freed, you'll be informed of this, and also where the block was free'd at. Likewise, if it should turn out to be just off the end of a malloc'd block, a common result of off-by-one-errors in array subscripting, you'll be informed of this fact, and also where the block was malloc'd.
In this example, Memcheck can't identify the address. Actually the address is on the stack, but, for some reason, this is not a valid stack address -- it is below the stack pointer, %esp, and that isn't allowed. In this particular case it's probably caused by gcc generating invalid code, a known bug in various flavours of gcc.
Note that Memcheck only tells you that your program is about to access memory at an illegal address. It can't stop the access from happening. So, if your program makes an access which normally would result in a segmentation fault, you program will still suffer the same fate -- but you will get a message from Memcheck immediately prior to this. In this particular example, reading junk on the stack is non-fatal, and the program stays alive.
5.2 Use of uninitialised values
For example:
Conditional jump or move depends on uninitialised value(s) at 0x402DFA94: _IO_vfprintf (_itoa.h:49) by 0x402E8476: _IO_printf (printf.c:36) by 0x8048472: main (tests/manuel1.c:8) by 0x402A6E5E: __libc_start_main (libc-start.c:129)
An uninitialised-value use error is reported when your program uses a value which hasn't been initialised -- in other words, is undefined. Here, the undefined value is used somewhere inside the printf() machinery of the C library. This error was reported when running the following small program:
int main() { int x; printf ("x = %d\n", x); }
It is important to understand that your program can copy around junk (uninitialised) data to its heart's content. Memcheck observes this and keeps track of the data, but does not complain. A complaint is issued only when your program attempts to make use of uninitialised data. In this example, x is uninitialised. Memcheck observes the value being passed to _IO_printf and thence to _IO_vfprintf, but makes no comment. However, _IO_vfprintf has to examine the value of x so it can turn it into the corresponding ASCII string, and it is at this point that Memcheck complains.
Sources of uninitialised data tend to be:
- Local variables in procedures which have not been initialised, as in the example above.
- The contents of malloc'd blocks, before you write something there. In C++, the new operator is a wrapper round malloc, so if you create an object with new, its fields will be uninitialised until you (or the constructor) fill them in, which is only Right and Proper.
5.3 Illegal frees
For example:
Invalid free() at 0x4004FFDF: free (vg_clientmalloc.c:577) by 0x80484C7: main (tests/doublefree.c:10) by 0x402A6E5E: __libc_start_main (libc-start.c:129) by 0x80483B1: (within tests/doublefree) Address 0x3807F7B4 is 0 bytes inside a block of size 177 free'd at 0x4004FFDF: free (vg_clientmalloc.c:577) by 0x80484C7: main (tests/doublefree.c:10) by 0x402A6E5E: __libc_start_main (libc-start.c:129) by 0x80483B1: (within tests/doublefree)
Memcheck keeps track of the blocks allocated by your program with malloc/new, so it can know exactly whether or not the argument to free/delete is legitimate or not. Here, this test program has freed the same block twice. As with the illegal read/write errors, Memcheck attempts to make sense of the address free'd. If, as here, the address is one which has previously been freed, you wil be told that -- making duplicate frees of the same block easy to spot.
5.4 When a block is freed with an inappropriate deallocation function
In the following example, a block allocated with new[]
has wrongly been deallocated with free
:
Mismatched free() / delete / delete [] at 0x40043249: free (vg_clientfuncs.c:171) by 0x4102BB4E: QGArray::~QGArray(void) (tools/qgarray.cpp:149) by 0x4C261C41: PptDoc::~PptDoc(void) (include/qmemarray.h:60) by 0x4C261F0E: PptXml::~PptXml(void) (pptxml.cc:44) Address 0x4BB292A8 is 0 bytes inside a block of size 64 alloc'd at 0x4004318C: __builtin_vec_new (vg_clientfuncs.c:152) by 0x4C21BC15: KLaola::readSBStream(int) const (klaola.cc:314) by 0x4C21C155: KLaola::stream(KLaola::OLENode const *) (klaola.cc:416) by 0x4C21788F: OLEFilter::convert(QCString const &) (olefilter.cc:272)
The following was told to me be the KDE 3 developers. I didn't know any of it myself. They also implemented the check itself.
In C++ it's important to deallocate memory in a way compatible with how it was allocated. The deal is:
- If allocated with
malloc
,calloc
,realloc
,valloc
ormemalign
, you must deallocate withfree
. - If allocated with
new[]
, you must deallocate withdelete[]
. - If allocated with
new
, you must deallocate withdelete
.
The worst thing is that on Linux apparently it doesn't matter if you do muddle these up, and it all seems to work ok, but the same program may then crash on a different platform, Solaris for example. So it's best to fix it properly. According to the KDE folks "it's amazing how many C++ programmers don't know this".
Pascal Massimino adds the following clarification: delete[]
must be called associated with a new[]
because the compiler stores the size of the array and the pointer-to-member to the destructor of the array's content just before the pointer actually returned. This implies a variable-sized overhead in what's returned by new
or new[]
. It rather surprising how compilers [Ed: runtime-support libraries?] are robust to mismatch in new
/delete
new[]
/delete[]
.
5.5 Passing system call parameters with inadequate read/write permissions
Memcheck checks all parameters to system calls. If a system call needs to read from a buffer provided by your program, Memcheck checks that the entire buffer is addressible and has valid data, ie, it is readable. And if the system call needs to write to a user-supplied buffer, Memcheck checks that the buffer is addressible. After the system call, Memcheck updates its administrative information to precisely reflect any changes in memory permissions caused by the system call.
Here's an example of a system call with an invalid parameter:
#include <stdlib.h> #include <unistd.h> int main( void ) { char* arr = malloc(10); (void) write( 1 /* stdout */, arr, 10 ); return 0; }
You get this complaint ...
Syscall param write(buf) contains uninitialised or unaddressable byte(s) at 0x4035E072: __libc_write by 0x402A6E5E: __libc_start_main (libc-start.c:129) by 0x80483B1: (within tests/badwrite) by <bogus frame pointer> ??? Address 0x3807E6D0 is 0 bytes inside a block of size 10 alloc'd at 0x4004FEE6: malloc (ut_clientmalloc.c:539) by 0x80484A0: main (tests/badwrite.c:6) by 0x402A6E5E: __libc_start_main (libc-start.c:129) by 0x80483B1: (within tests/badwrite)
... because the program has tried to write uninitialised junk from the malloc'd block to the standard output.
5.6 Overlapping source and destination blocks
The following C library functions copy some data from one memory block to another (or something similar): memcpy()
, strcpy()
, strncpy()
, strcat()
, strncat()
. The blocks pointed to by their src
and dst
pointers aren't allowed to overlap. Memcheck checks for this.
For example:
==27492== Source and destination overlap in memcpy(0xbffff294, 0xbffff280, 21) ==27492== at 0x40026CDC: memcpy (mc_replace_strmem.c:71) ==27492== by 0x804865A: main (overlap.c:40) ==27492== by 0x40246335: __libc_start_main (../sysdeps/generic/libc-start.c:129) ==27492== by 0x8048470: (within /auto/homes/njn25/grind/head6/memcheck/tests/overlap) ==27492==
You don't want the two blocks to overlap because one of them could get partially trashed by the copying.