Thursday, June 30, 2011

Determining the Size of a Class Object



There are many factors that decide the size of an object of a class in C++. These factors are:
  1. Size of all non-static data members
  2. Order of data members
  3. Byte alignment or byte padding
  4. Size of its immediate base class
  5. The existence of virtual function(s) (Dynamic polymorphism using virtual functions).
  6. Compiler being used
  7. Mode of inheritance (virtual inheritance)

Size of all non-static data members
Only non-static data members will be counted for calculating sizeof class/object.
class A {
private:
        float iMem1;
        const int iMem2;
        static int iMem3;
        char iMem4;
};
For an object of class A, the size will be the size of float iMem1 + size of int iMem2 + size of char iMem4. Static members are really not part of the class object. They won't be included in object's layout. <2>Order of data members The order in which one specifies data members also alters the size of the class.
class C {
        char c;
        int int1;
        int int2;
        int i;
        long l;
        short s;
};
The size of this class is 24 bytes. Even though char c will consume only 1 byte, 4 bytes will be allocated for it, and the remaining 3 bytes will be wasted (holes). This is because the next member is an int, which takes 4 bytes. If we don't go to the next (4th) byte for storing this integer member, the memory access/modify cycle for this integer will be 2 read cycles. So the compiler will do this for us, unless we specify some byte padding/packing.

If I re-write the above class in different order, keeping all my data members like below:
class C {
        int int1;
        int int2;
        int i;
        long l;
        short s;
        char c;
};
Now the size of this class is 20 bytes.

In this case, it is storing c, the char, in one of the slots in the hole in the extra four bytes.
Byte alignment or byte padding
As mentioned above, if we specify 1 byte alignment, the size of the class above (class C) will be 19 in both cases.
Size of its immediate base class
The size of a class also includes size of its immediate base class.

Let's take an example:
Class B {
...
        int iMem1;
        int iMem2;
}

Class D: public B {
...
        int iMem;
}
In this case, sizeof(D) is will also include the size of B. So it will be 12 bytes.
The existence of virtual function(s)
Existence of virtual function(s) will add 4 bytes of virtual table pointer in the class, which will be added to size of class. Again, in this case, if the base class of the class already has virtual function(s) either directly or through its base class, then this additional virtual function won't add anything to the size of the class. Virtual table pointer will be common across the class hierarchy. That is
class Base {
public:
...
        virtual void SomeFunction(...);
private:
        int iAMem
};

class Derived : public Base {
...
        virtual void SomeOtherFunction(...);
private:
        int iBMem
};
In the example above, sizeof(Base) will be 8 bytes--that is sizeof(int iAMem) + sizeof(vptr). sizeof(Derived) will be 12 bytes, that is sizeof(int iBMem) + sizeof(Derived). Notice that the existence of virtual functions in class Derived won't add anything more. Now Derived will set the vptr to its own virtual function table.
Compiler being used
In some scenarios, the size of a class object can be compiler specific. Let's take one example:
class BaseClass {
        int a;
        char c;
};

class DerivedClass : public BaseClass {
        char d;
        int i;
};
If compiled with the Microsoft C++ compiler, the size of DerivedClass is 16 bytes. If compiled with gcc (either c++ or g++), size of DerivedClass is 12 bytes.

The reason for sizeof(DerivedClass) being 16 bytes in MC++ is that it starts each class with a 4 byte aligned address so that accessing the member of that class will be easy (again, the memory read/write cycle).
Mode of inheritance (virtual inheritance)
In C++, sometimes we have to use virtual inheritance for some reasons. (One classic example is the implementation of final class in C++.) When we use virtual inheritance, there will be the overhead of 4 bytes for a virtual base class pointer in that class.
class ABase{
        int iMem;
};

class BBase : public virtual ABase {
        int iMem;
};

class CBase : public virtual ABase {
        int iMem;
};

class ABCDerived : public BBase, public CBase {
        int iMem;
};
And if you check the size of these classes, it will be:
  • Size of ABase : 4
  • Size of BBase : 12
  • Size of CBase : 12
  • Size of ABCDerived : 24
Because BBase and CBase are derived from ABase virtually, they will also have an virtual base pointer. So, 4 bytes will be added to the size of the class (BBase and CBase). That is sizeof ABase + size of int + sizeof Virtual Base pointer.

Size of ABCDerived will be 24 (not 28 = sizeof (BBase + CBase + int member)) because it will maintain only one Virtual Base pointer (Same way of maintaining virtual table pointer).

Wednesday, June 29, 2011

Code Coverage using Gcov

Code Coverage using Gcov

How Gcov works ?
Gcov [GCOV01] is a tool part of the GNU CC suite used for code coverage analysis.
The easiest way to get started with gcov, on a modern Unix operating system, is to do the
following:
1- ./configure --with-your-options CFLAGS="-static -fprofile-arcs -ftest-coverage"
2- make
3- ./your_binary
4- gcov main.c
dot.bb file contains a list of source files (including headers), functions within those files and line numbers corresponding to each basic block in the source file. dot.bbg file contains a list of the program flow arcs for each function which in combination with the .bb file enables gcov to reconstruct the program flow. At runtime, the counter vector entries are incremented every time an instrumented basic block is entered then the program dumps the counter information into the dot.da at the time of exit (it populates the file with the size of the vector and the counters of the vector itself). This is further documented in [GCOV01] Section 8.4: « Brief description of gcov data files ».

Notes :
(a) One can notice that those files have changed since gcc- 3.4 (CC by default on FreeBSD 6.x). Only one file is now created at build time: dot.gcda, when exiting, another one called dot.gcno is created, containing the results.
(b) Another interesting change appears in gcc- 3.3 where __bb_fork_func has been renamed __gcov_flush To start with, we recommand that you statically link your application and, preferably, build it without optimisations.

Note :
dynamic libraries and constructor function __bb_init_func and __bb_fork_func :
If you want to avoid such an error: «Undefined symbol "__bb_init_func"», use static binaries.
Explaination: when compiling using Gcov special flags "- fprofile- arcs" and "- ftest- coverage", the linking of dynamic libraries may not perform as expected.
Then, in our case, we need to move some files generated by gcc on the build host (dot.bb, dot.bbg and the source code) to the custom firewall .
Another point: Gcov requires the source tree be the exactly the same as the one on the build host. From the gcov(1) manpage:

« gcov should be run with the current directory the same as that when you
invoked the compiler. Otherwise it will not be able to locate the source
files. »
Hopefully, you do not have to worry about where, on the filesystem, you run the binary because it is statically linked and the PATH to where the binary has been built is hardcoded during the compilation process. Using strings(1) against an instrumented binary will confirm that. Nevertheless, the PATH needs to be restructured on the coverage/testing host before you launch it, otherwise it won't be able to create the dot.da files at the end of its execution.

Example: arc profiling: Can't open output file /home/update/hping3-apha1-pre2/sendrawip.da

Having to relocate those files is somewhat ugly and may be avoid as soon as we switch to gcc- 3.4 which introduce cross- profiling features [GCOV02].
After its invocation, gcov products dot.gcov files containing the original source code.
The first row of this file is used to indicate the number of times the function has been called during the tests. Lines starting with the ##### string indicate lines that have never been executed (ie: not covered by
the regression test) and the ones starting with the – string are lines without code. A sample gcov output file:

-: 0:Source:testssl.c
-: 0:Object:testssl.bb
-: 1:#include <stdio.h>
-: 2:
-: 3:#define OPENSSL_THREAD_DEFINES
-: 4:#include <openssl/opensslconf.h>
-: 5:
-: 6:
1: 7:int main() {
-: 8:#if defined(THREADS)
-: 9: printf("SSL has threads\n");
-: 10:#else
1: 11: printf("SSL has no threads\n");
call 0 returns 100%
-: 12:#endif
-: 13:
-: 14:}

As you can notice, gcov only knows about binary in the first time, thus we get 100% coverage with
this trivial example even if one printf() is not called (removed at compilation by the C pre- processor).
This is logical but needs to be well understood especially for some portability cases since we can get
different coverage metrics from one OS to another because of this.
Developers should read [GCOV01] §8.3: « Using gcov with GCC Optimization ».

User Interface
Because gcov only creates ASCII text files, the lastest stage was to use some parsing tools to generate human readable reports. As usual we do not want to reinvent the wheel and finally choose lcov [LTP01] from the Linux Testing Project for this part. Lcov automates the process of extracting the coverage data using Gcov and producing HTML results based on that data.
This tool is licenced under the terms of the GPL v2 and deals well with large projects (for example tproxyd is linked with step less than 22 libraries !).

> geninfo --no-checksum --directory appdir --capture --output-filename tproxyd.info
> genhtml -o /export/lcov/tproxyd tproxyd.info

Another project similar to lcov is ggcov [GGCOV01], it implements some features missing in lcov like the ability to quickly know which functions in the source code were never called. This feature and the doxygen documentation could be a great help when writing unit tests for an increasing code coverage metric. Last but not least, the real benefit of code coverage analysis is that it can be used to analyze and improve the coverage provided by a test suite. In these terms, code coverage is necessary but not suffisient.

[GCOV01] http:/ /gcc.gnu.org/onlinedocs/gcc- 3.0/gcc_8.html
[GCOV02] http:/ /gcc.gnu.org/onlinedocs/gcc/Cross_002dprofiling.html#Cross_002dprofiling
[LTP01] http: / / l tp.sourceforge.net/coverage/lcov.readme.php
[GGCOV01] http:/ /ggcov.sourceforge.net/
[20] http:/ /www- 128.ibm.com/developerworks/linux/ library/l - stress/
[21] http:/ /archive.linuxsymposium.org/ols2003/Proceedings/All - Reprints/Reprint - Larson-
OLS2003.pdf

Monday, June 27, 2011

Smart Pointer's

Introduction

What are smart pointers? The answer is fairly simple; a smart pointer is a pointer which is smart. What does that mean? Actually smart pointers are objects which behave like pointers, but do more than a pointer. These objects are flexible as pointers and have the advantage of being an object (like constructor and destructors called automatically). A smart pointer is designed to handle the problems caused by using normal pointers (hence called smart).

Problems with Pointers

What are the common problems we face in C++ programs while using pointers? The answer is memory management. Have a look at the following code.

char* pName  = new char[1024];
SetName(pName);
if(null != pName)
{
      delete[] pName;
}

How many times we found out a bug which was caused because we forgot deleting ?pName?. It would be great if somebody takes care of releasing the memory when the pointer is not useful (we are not talking about the garbage collector here). What if the pointer itself takes care of that, yes that?s exactly what smart pointer is intended to do. Let us write a smart pointer and see how we can handle a pointer better.
We shall start with a realistic example. Let?s say we have a class called Person which is defined as listed below.
class Person
{
   int age;
   char* pName;

   public:
      Person(): pName(0),age(0)
      {
      }
      Person(char* pName, int age): pName(pName), age(age)
      {
      }
      ~Person()
      {
      }

      void Display()
      {
          printf("Name = %s Age = %d \n", pName, age);       
      }
      void Shout()
      {
         printf("Ooooooooooooooooo");       
      }
 };

/*
 *Now we shall write the client code to use Person
 */
  void main()
   {
       Person* pPerson  = new Person("Scott", 25);
       pPerson->Display();
       delete pPerson;
   }

Now look at this code, every time I create a pointer I need to take care of deleting it, this is exactly what I want to avoid. I need some automatic mechanism which deletes the pointer. One thing which strikes to me is a destructor. But pointers do not have destructors, so what our smart pointer can have one. So we will create a class called SP which can hold a pointer to the person class and will delete the pointer when its destructor is called. Hence my client code will change to something like this.
  void main()
   {
       SP p(new Person("Scott", 25));
       p->Display();
       // Dont need to delete Person pointer..

   }

Note the following things
We have created an object of class SP which holds our Person class pointer. Since the destructor of SP class will be called when this object goes out of scope, it will delete the Person class pointer (as its main responsibility); hence we don?t have the pain of deleting the pointer.
One more thing of major importance is that we should be able to call the Display method using the SP class object the way we used to call using the Person class pointer, i.e. the class should behave exactly like a pointer.
Interface for a smart pointer:
Since the smart pointer should behave like a pointer, it should support the same interface as the pointers do; i.e. they should support the following operations.
Dereferencing (operator *)
Indirection (operator ->)
Let us write the SP class now
  class SP
   {
   private:
       Person*    pData; // pointer to person class

   public:
       SP(Person* pValue) : pData(pValue)
       {
       }
       ~SP()
       {
           // pointer no longer requried
           delete pData;
       }

       Person& operator* ()
       {
           return *pData;
       }
       Person* operator-> ()
       {   
           return pData;
       }
   };

This class is our smart pointer class. The main responsibility of this class is to hold a pointer to Person class, and then delete it when its destructor is called. It should also support the interface of the pointer.

Generic smart pointer class

One problem which we see here is that we can use this smart pointer class for pointer of Person class only. This means that we have to create each smart pointer class for each type, that?s not easy. We can solve this problem by making use of templates and make this smart pointer class generic. So let us change the code like this.
  template < typename T > class SP
   {
       private:
       T*    pData; // Generic pointer to be stored

       public:
       SP(T* pValue) : pData(pValue)
       {
       }
       ~SP()
       {
           delete pData;
       }

       T& operator* ()
       {
           return *pData;
       }
  
       T* operator-> ()
       {
           return pData;
       }
   };

   void main()
   {
       SP p(new Person("Scott", 25));
       p->Display();
       // Dont need to delete Person pointer..

   }

Now we can use our smart pointer class for any type of pointers. So is our smart pointer really smart? Check the following code segment.
  void main()
   {
       SP p(new Person("Scott", 25));
       p->Display();
       {
           SP q = p;
           q->Display();
           // Destructor of Q will be called here..
       }
       p->Display();
   }

Look what happens here p and q are referring to the same Person class pointer, now when q goes out of scope the destructor of q will be called which deletes the Person class pointer. Now we cannot call p->Display(); since p will be left with a dangling pointer and this call will fail. (Note that this problem would have existed even if we were using normal pointers instead of smart pointers) We should not delete the Person class pointer unless no body is using it. How do we do that? Implementing reference counting mechanism in our smart pointer class will solve this problem.

Reference counting

What we are going to do is we will have a reference counting class RC. This class will maintain an integer value which represents the reference count. We will have methods to increment and decrement the reference count.
  class RC
   {
       private:
          int count; // Reference count
       public:
       void AddRef()
       {
           // Increment the reference count
           count++;
       }

       int Release()
       {
           // Decrement the reference count and
           // return the reference count.
           return --count;
       }
   };

Now we have a reference counting class, we will introduce this to our smart pointer class. We will maintain a pointer to class RC in our SP class and this pointer will be shared for all instance of the smart pointer which refers to the same pointer. For this to happen we need to have an assignment operator and copy constructor in our SP class.
  template < typename T > class SP
   {
   private:
       T*    pData;       // pointer
       RC* reference; // Reference count

   public:
       SP() : pData(0), reference(0)
       {
           // Create a new reference
           reference = new RC();
           // Increment the reference count
           reference->AddRef();
       }
       SP(T* pValue) : pData(pValue), reference(0)
       {
           // Create a new reference
           reference = new RC();
           // Increment the reference count
           reference->AddRef();
       }

  SP(const SP<T>& sp) : pData(sp.pData), reference(sp.reference)
       {
           // Copy constructor
           // Copy the data and reference pointer
           // and increment the reference count
           reference->AddRef();
       }

       ~SP()
       {
           // Destructor
           // Decrement the reference count
           // if reference become zero delete the data

           if(reference->Release() == 0)
           {
               delete pData;
               delete reference;
           }
       }

       T& operator* ()
       {
           return *pData;
       }
  
       T* operator-> ()
       {
           return pData;
       }
  
       SP<T>& operator = (const SP<T>& sp)
       {
           // Assignment operator
           if (this != &sp) // Avoid self assignment
           {
               // Decrement the old reference count
               // if reference become zero delete the old data

               if(reference->Release() == 0)
               {
                   delete pData;
                   delete reference;
               }

               // Copy the data and reference pointer
               // and increment the reference count

               pData = sp.pData;
               reference = sp.reference;
               reference->AddRef();
           }
           return *this;
       }
   };

Let us have a look at the client code.
  void main()
   {
       SP p(new Person("Scott", 25));
       p->Display();
       {
           SP q = p;
           q->Display();
           // Destructor of q will be called here..
           SP r;
           r = p;
           r->Display();
           // Destructor of r will be called here..
       }
       p->Display();
       // Destructor of p will be called here
       // and person pointer will be deleted
   }

When we create a smart pointer p of type person, the constructor of SP will be called, the data will be stored and a new RC pointer will be created. The AddRef method of RC is called to increment the reference count to 1. Now SP q = p; will create a new smart pointer q using the copy constructor, here the data will be copied and reference will again incremented to 2. Now r = p; will call the assignment operator to assign the value of p to q, here also we copy the data and increment the reference count, thus making the count 3. When r and q goes out of scope the destructors of respective objects will be called, here the reference count will be decremented, but data will not be deleted unless the reference count becomes zero, this happens only when destructor of p is called. Hence our data will be deleted only when no body is referring to it.

Applications

Memory leaks: 
Using smart pointers reduces work of managing pointers for memory leaks. Now you could create a pointer and forget about deleting it, the smart pointer will do that for you. This is the simplest garbage collector we could think off.
Exceptions: 
Smart pointers are very useful where exceptions are used. For example look at the following code.

  void MakeNoise()
   {
       Person* p = new Person("Scott", 25);
       p->Shout();
       delete p;
   }

We are using a normal pointer here and deleting it after using, so every thing looks okay here. But what if our Shout function thows some exception delete p; will never be called. So we have a memory leak, let us handle that.
  void MakeNoise()
   {
       Person* p = new Person("Scott", 25);
       try
       {
           p->Shout();
       }
       catch(...)
       {
           delete p;
           throw;
       }
       delete p;
   }

Don't you think this is a over head of catching an exception and rethrowing it? This code becomes cumbersome if you have many pointes created. How will a smart pointer help here, lets have a look at the same code if smart pointer is used.
  void MakeNoise()
   {
       SP<Person> p(new Person("Scott", 25));
       p->Shout();
   }

We are making use of a smart pointer here; yes we don?t need to catch exception here. If the Shout method throws an exception stack unwinding will happen for the function and during this the destructor of all local object will be called, hence destructor of p will be called which will release the memory hence we are safe. So this makes it very useful to use smart pointers here.
Conclusion
Smart pointers are useful for writing safe and efficient code in C++. Make use of smart pointers and take the advantage of garbage collection. Take a look at Scott Meyers' auto_ptr implementation in STL.