Sunday, June 26, 2011

Valgrind

1.1. Why Valgrind?
As said above, memory management is prone to errors that are too hard to detect. Common errors may be listed as:
  1. Use of uninitialized memory
  2. Reading/writing memory after it has been freed
  3. Reading/writing off the end of malloc'd blocks
  4. Reading/writing inappropriate areas on the stack
  5. Memory leaks -- where pointers to malloc'd blocks are lost forever
  6. Mismatched use of malloc/new/new[] vs free/delete/delete[]
  7. Some misuses of the POSIX pthreads API
These errors usually lead to crashes.
This is a situation where we need Valgrind. Valgrind works directly with the executables, with no need to recompile, relink or modify the program to be checked. Valgrind decides whether the program should be modified to avoid memory leak, and also points out the spots of "leak."
Valgrind simulates every single instruction your program executes. For this reason, Valgrind finds errors not only in your application but also in all supporting dynamically-linked (.so-format) libraries, including the GNU C library, the X client libraries, Qt if you work with KDE, and so on. That often includes libraries, for example the GNU C library, which may contain memory access violations.
1.2. Usage
1.2.1. Invoking Valgrind
The checking may be performed by simply placing the word valgrind just before the normal command used to invoke the program. For example:


#valgrind ps -ax
Valgrind provides thousands of options. We deliberately avoid them, not to make this article boring.
The output contains the usual output of ps -ax also with the detailed report by valgrind. Any error (memory related) is pointed out in the error report.
1.2.2. How to Identify the Error from the Error Report
Consider the output of Valgrind for some test program:


   ==1353== Invalid read of size 4
   ==1353==    at 0x80484F6: print (valg_eg.c:7)
   ==1353==    by 0x8048561: main (valg_eg.c:16)
   ==1353==    by 0x4026D177: __libc_start_main
(../sysdeps/generic/libc-start.c   :129)
   ==1353==    by 0x80483F1: free@@GLIBC_2.0 (in /home/deepu/valg/a.out)
   ==1353==    Address 0x40C9104C is 0 bytes after a block of size 40
alloc'd
   ==1353==    at 0x40046824: malloc (vg_clientfuncs.c:100)
   ==1353==    by 0x8048524: main (valg_eg.c:12)
   ==1353==    by 0x4026D177: __libc_start_main
(../sysdeps/generic/libc-start.c   :129)
   ==1353==    by 0x80483F1: free@@GLIBC_2.0 (in /home/deepu/valg/a.out)
Here, 1353 is the process ID. This part of the error report says that a read error has occurred at line number 7, in the function print. The function print is called by function main, and both are in the file valg_eg.c. The function main is called by the function __libc_start_main at line number 129, in ../sysdeps/generic/libc-start.c. The function __libc_start_main is called by free@@GLIBC_2.0 in the file /home/deepu/valg/a.out. Similarly details of calling malloc are also given.
1.2.3. Types of Errors with Examples
Valgrind can only really detect two types of errors: use of illegal address and use of undefined values. Nevertheless, this is enough to discover all sorts of memory management problems in a program. Some common errors are given below.
1.2.3.1. Use of uninitialized memory
Sources of uninitialized data are:
      local variables that have not been initialized.
      The contents of malloc'd blocks, before writing something there.
This is not a problem with calloc since it initializes each allocated bytes with 0. The new operator in C++ is similar to malloc. Fields of the created object will be uninitialized.
Sample program:


#include <stdlib.h>
int main()
{
        int p, t;
        if (p == 5)             /*Error occurs here*/
               t = p+1;
        return 0;
}
Here the value of p is uninitialized, therefore p may contain some random value (garbage), so an error may occur at the condition check. An uninitialized variable will cause error in 2 situations:
      When it is used to determine the outcome of a conditional branch. Eg:'if (p == 5)' in the above program.
      When it is used to generate a memory address. Eg: In the above program let there be an integer array a[10], and if you write 'a[p] = 1', it will generate an error.
1.2.3.2. Illegal read/write
Illegal read/write errors occurs when you try to read/write from/to an address that is not in the address range of your program.
Sample program:


#include <stdlib.h>
int main()
{
        int *p, i, a;
        p = malloc(10*sizeof(int));
        p[11] = 1;                /* invalid write error */
        a = p[11];                /* invalid read error */
                free(p);
        return 0;
}
Here you are trying to read/write from/to address (p+sizeof(int)*11) which is not allocated to the program.
1.2.3.3. Invalid free
Valgrind keeps track of blocks allocated to your program with malloc/new. So it can easily check whether argument to free/delete is valid or not.
Sample program:


#include <stdlib.h>
int main()
{
        int *p, i;
        p = malloc(10*sizeof(int));
        for(i = 0;i < 10;i++)
                p[i] = i;
        free(p);
        free(p);        /* Error: p has already been freed */
        return 0;
}
Valgrind checks the address, which is given as argument to free. If it is an address that has already been freed you will be told that the free is invalid.
1.2.3.4. Mismatched Use of Functions
In C++ you can allocate and free memory using more than one function, but the following rules must be followed:
      If allocated with malloc, calloc, realloc, valloc or memalign, you must deallocate with free.
      If allocated with new[], you must deallocate with delete[].
      If allocated with new, you must deallocate with delete.
Sample program:


#include <stdlib.h>
int main()
{
        int *p, i;
        p = ( int* ) malloc(10*sizeof(int));
        for(i = 0;i < 10;i++)
                p[i] = i;
        delete(p);                /* Error: function mismatch */
        return 0;
}
Output by valgrind is:


             ==1066== ERROR SUMMARY: 1 errors from 1 contexts (suppressed:
0 from 0)
             ==1066== malloc/free: in use at exit: 0 bytes in 0 blocks.
             ==1066== malloc/free: 1 allocs, 1 frees, 40 bytes allocated.
             ==1066== For a detailed leak analysis,  rerun with:
--leak-check=yes
             ==1066== For counts of detected errors, rerun with: -v
>From the above "ERROR SUMMARY" it is clear that there is 0 bytes in 0 blocks in use at exit, which means that the malloc'd have been freed by delete. Therefore this is not a problem in Linux, but this program may crash on some other platform.
1.2.3.5. Errors Occur Due to Invalid System Call Parameter
Valgrind checks all parameters to system calls.
Sample program:


#include <stdlib.h>
#include <unistd.h>
int main()
{
        int *p;
        p = malloc(10);
        read(0, p, 100);        /* Error: unaddressable bytes */
        free(p);
        return 0;
}



             ==1045== Syscall param read(buf) contains unaddressable
byte(s)
             ==1045==    at 0x4032AF44: __libc_read (in
/lib/i686/libc-2.2.2.so)
             ==1045==    by 0x4026D177: __libc_start_main
(../sysdeps/generic/libc-start.c:129)
             ==1045==    by 0x80483E1: read@@GLIBC_2.0 (in
/home/deepu/valg/a.out)
Here, buf = p contains the address of a 10 byte block. The read system call tries to read 100 bytes from standard input and place it at p. But the bytes after the first 10 are unaddressable.
1.2.3.6. Memory Leak Detection
Consider the following program:


#include <stdlib.h>
int main()
{
        int *p, i;
        p = malloc(5*sizeof(int));
        for(i = 0;i < 5;i++)
                p[i] = i;
        return 0;
}



             ==1048== LEAK SUMMARY:
             ==1048==    definitely lost: 20 bytes in 1 blocks.
             ==1048==    possibly lost:   0 bytes in 0 blocks.
             ==1048==    still reachable: 0 bytes in 0 blocks.
In the above program p contains the address of a 20-byte block. But it is not freed anywhere in the program. So the pointer to this 20 byte block is lost forever. This is known as memory leaking. We can get the leak summary by using the Valgrind option --leak-check=yes.
1.2.4. How to Suppress Errors
Valgrind detects numerous problems in many programs which come pre-installed on your GNU/Linux system. You can't easily fix these, but you don't want to see these errors (and yes, there are many!). So Valgrind reads a list of errors to suppress at startup, from a suppression file ending in .supp.
Suppression files may be modified. This is useful if part of your project contains errors you can't or don't want to fix, yet you don't want to continuously be reminded of them. The format of the file is as follows.


{
             Error name
             Type
             fun:function name, which contains the error to suppress
        fun:function name, which calls the function specified above
}



Error name can be any name.
             type=ValueN, if the error is an uninitialized value error.
                 =AddrN, if it is an address error.(N=sizeof(data type))
                 =Free, if it is a free error (eg:mismatched free)
                 =Cond, if error is due to uninitialized CPU condition code.
                 =Param, if it is an invalid system call parameter error.
You can then run the program with:


valgrind --suppressions=path/to/the/supp_file.supp testprog
The output will not contain the errors specified in the suppression file.
1.3. Limitations and Dependencies of Valgrind.
No software is free from limitations. The same is the case of Valgrind, however most programs work fine. The limitations are listed below.
  1. Program runs 25 to 50 times slower.
  2. Increased memory consumption.
  3. Highly optimized code (compiled with -O1, -O2 options ) may sometimes cheat Valgrind.
  4. Valgrind relies on dynamic linking mechanism.
Valgrind is closely tied to details of the CPU, operating system and to a less extent, compiler and basic C libraries. Presently Valgrind works only on the Linux platform (kernels 2.2.X or 2.4.X) on x86s. Glibc 2.1.X or 2.2.X is also required for Valgrind.

2. Let's Go Deeper

Valgrind simulates an Intel x86 processor and runs our test program in this synthetic processor. The two processors are not exactly same. Valgrind is compiled into a shared object, valgrind.so. A shell script valgrind sets the LD_PRELOAD environment variable to point to valgrind.so. This causes the .so to be loaded as an extra library to any subsequently executed dynamically-linked ELF binary, permitting the program to be debugged.
The dynamic linker calls the initialization function of Valgrind. Then the synthetic CPU takes control from the real CPU. In the memory there may be some other .so files. The dynamic linker calls the initialization function of all such .so files. Now the dynamic linker calls the main of the loaded program. When main returns, the synthetic CPU calls the finalization function of valgrind.so. During the execution of the finalization function, summary of all errors detected are printed and memory leaks are checked. Finalization function exits giving back the control from the synthetic CPU to the real one.

2.1. How Valgrind Tracks Validity of Each Byte

For every byte processed, the synthetic processor maintains 9 bits, 8 'V' bits and 1 'A' bit. The 'V' bits indicate the validity of the 8 bits in the byte and the 'A' bit indicates validity of the byte address. These valid-value(V) bits are checked only in two situations:
  1. when data is used for address generation,
  2. when control flow decision is to be made.
In any of these two situations, if the data is found to be undefined an error report will be generated. But no error reports are generated while copying or adding undefined data.
However the case with floating-point data is different. During a floating-point read instruction the 'V' bits corresponding to the data are checked. Thus copying of uninitialized value will produce error in case of floating-point numbers.



#include <stdlib.h>
int main()
{
        int *p, *a;
        p = malloc(10*sizeof(int));
        a = malloc(10*sizeof(int));
        a[3] = p[3];
        free(a);
        free(p);
        return 0;
}

/*  produce no errors */





#include <stdlib.h>
int main()
{
        float *p, *a;
        p = malloc(10*sizeof(float));
        a = malloc(10*sizeof(float));
        a[3] = p[3];
        free(a);
        free(p);
        return 0;
}

/* produces error */

All bytes that are in memory but not in CPU have an associated valid-address(A) bit, which indicates whether the corresponding memory location is accessible by the program. When a program starts, the 'A' bits corresponding to each global variables are set. When a call malloc, new or any other memory allocating function is made, the 'A' bits corresponding to the allocated bytes are set. Upon freeing the allocated block using free/new/new‘’ the corresponding 'A' bits are cleared. While doing a system call the 'A' bits are changed appropriately.
When values are loaded from memory the 'A' bits corresponding to each bytes are checked by Valgrind, and if the 'A' bit corresponding to a byte is set then its 'V' bits is checked. If the 'V' bits are not set, an error will be generated and the 'V' bits are set to indicate validity. This avoids long chain of errors. If the 'A' bit corresponding to a loaded byte is 0 then its 'V' bits are forced to set, despite the value being invalid.
Have a look on the following program. Run it.


#include <stdlib.h>
int main()
{
        int *p, j;
        p = malloc(5*sizeof(int));
        j = p[5];
        if (p[5] == 1)
                i = p[5]+1;
        free(p);
        return 0;
}
Here two errors occur. Both of them are due to the accessing address location p + sizeof(int)*5 which is not allocated to the program. During the execution of j = p[5], since the address p + sizeof(int)*5 is invalid, the 'V' bits of 4 bytes starting at location p+sizeof(int)*5 are forced to set. Therefore uninitialized value occurs neither during the execution of j = p[5] nor during the execution of if(p[5]==1).

2.2. Cache Profiling

Modern x86 machines use two levels of caching. These levels are L1 and L2, in which L1 is a split cache that consists of Instruction cache(I1) and Data cache(D1). L2 is a unified cache.
The configuration of a cache means its size, associativity and number of lines. If the data requested by the processor appears in the upper level it is called a hit. If the data is not found in the upper level, the request is called a miss. The lower level in the hierarchy is then accessed to retrieve the block containing requested data. In modern machines L1 is first searched for data/instruction requested by the processor. If it is a hit then that data/instruction is copied to some register in the processor. Otherwise L2 is searched. If it is a hit then data/instruction is copied to L1 and from there it is copied to a register. If the request to L2 also is a miss then main memory has to be accessed.
Valgrind can simulate the cache, meaning it can display the things that occur in the cache when a program is running. For this, first compile your program with -g option as usual. Then use the shell script cachegrind instead of valgrind.
Sample output:


==7436== I1  refs:      12,841
==7436== I1  misses:       238
==7436== L2i misses:       237
==7436== I1  miss rate:   1.85%
==7436== L2i miss rate:   1.84%
==7436==
==7436== D   refs:       5,914  (4,626 rd + 1,288 wr)
==7436== D1  misses:       357  (  324 rd +    33 wr)
==7436== L2d misses:       352  (  319 rd +    33 wr)
==7436== D1  miss rate:    6.0% (  7.0%   +   2.5%  )
==7436== L2d miss rate:    5.9% (  6.8%   +   2.5%  )
==7436==
==7436== L2 refs:          595  (  562 rd +    33 wr)
==7436== L2 misses:        589  (  556 rd +    33 wr)
==7436== L2 miss rate:     3.1% (  3.1%   +   2.5%  )



   L2i misses means the number of instruction misses that occur in L2
cache.
   L2d misses means the number of data misses that occur in L2 cache.
   Total number of data references = Number of reads + Number of writes.
   Miss rate means fraction of misses that are not found in the upper
level.
The shell script cachegrind also produces a file, cachegrind.out, that contains line-by-line cache profiling information which is not humanly understandable. A program vg_annotate can easily interpret this information. If the shell script vg_annotate is used without any arguments it will read the file cachegrind.out and produce an output which is humanly understandable.
When C, C++ or assembly source programs are passed as input to vg_annotate it displays the number of cache reads, writes, misses etc.


I1 cache:         16384 B, 32 B, 4-way associative
D1 cache:         16384 B, 32 B, 4-way associative
L2 cache:         262144 B, 32 B, 8-way associative
Command:          ./a.out
Events recorded:  Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Events shown:     Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Event sort order: Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw
Thresholds:       99 0 0 0 0 0 0 0 0
Include dirs:
User annotated:   valg_flo.c
Auto-annotation:  off
User-annotated source: valg_flo.c:


Ir I1mr I2mr Dr D1mr D2mr Dw D1mw D2mw

 .   .   .   .   .    .   .   .    .   #include<stdlib.h>
 .   .   .   .   .    .   .   .    .   int main()
 3   1   1   .   .    .   1   0    0   {
 .   .   .   .   .    .   .   .    .           float *p, *a;
 6   1   1   .   .    .   3   0    0           p = malloc(10*sizeof(float));
 6   0   0   .   .    .   3   0    0           a = malloc(10*sizeof(float));
 6   1   1   3   1    1   1   1    1           a[3] = p[3];
 4   0   0   1   0    0   1   0    0           free(a);
 4   0   0   1   0    0   1   0    0           free(p);
 2   0   0   2   0    0   .   .    .   }
      Ir = Total instruction cache reads.
      I1mr = I1 cache read misses.
      I2mr = L2 cache instruction read misses.


No comments:

Post a Comment