Monday, July 11, 2011

How the compiler works

This topic describes in more detail how GCC transforms source files to an executable file. Compilation is a multi-stage process involving several tools, including the GNU Compiler itself (through the gcc or g++ frontends), the GNU Assembler as, and the GNU Linker ld. The complete set of tools used in the compilation process is referred to as a toolchain.

11.1 An overview of the compilation process

The sequence of commands executed by a single invocation of GCC consists of the following stages:
  • preprocessing (to expand macros)
  • compilation (from source code to assembly language)
  • assembly (from assembly language to machine code)
  • linking (to create the final executable)
As an example, we will examine these compilation stages  individually using the Hello World program ‘hello.c’:

#include <stdio.h>
int main (void)
{
  printf ("Hello, world!\n");
  return 0;
}

Note that it is not necessary to use any of the individual commands described in this section to compile a program. All the commands are executed automatically and transparently by GCC internally, and can be seen using the -v option described earlier (see section 9.3 Verbose compilation). The purpose of this chapter is to provide an understanding of how the compiler works.
Although the Hello World program is very simple it uses external header files and libraries, and so exercises all the major steps of the compilation process. 

11.2 The preprocessor

The first stage of the compilation process is the use of the preprocessor to expand macros and included header files. To perform this stage, GCC executes the following command.

$ cpp hello.c > hello.i

The result is a file ‘hello.i’ which contains the source code with all macros expanded. By convention, preprocessed files are given the file extension ‘.i’ for C programs and ‘.ii’ for C++ programs. In practice, the preprocessed file is not saved to disk unless the -save-temps option is used.  

11.3 The compiler

The next stage of the process is the actual compilation of preprocessed source code to assembly language, for a specific processor. The command-line option -S instructs gcc to convert the preprocessed C source code to assembly language without creating an object file: 

$ gcc -Wall -S hello.i

The resulting assembly language is stored in the file ‘hello.s’. Here is what the Hello World assembly language for an Intel x86 (i686) processor looks like:  

$ cat hello.s
    .file  "hello.c"
    .section  .rodata
.LC0:
    .string  "Hello, world!\n"
    .text
.globl main
    .type  main, @function
main:
    pushl  %ebp
    movl  %esp, %ebp
    subl  $8, %esp
    andl  $-16, %esp
    movl  $0, %eax
    subl  %eax, %esp
    movl  $.LC0, (%esp)
    call  printf
    movl  $0, %eax
    leave
    ret
    .size  main, .-main
    .ident  "GCC: (GNU) 3.3.1"

Note that the assembly language contains a call to the external function printf.

11.4 The assembler

The purpose of the assembler is to convert assembly language into machine code and generate an object file. When there are calls to external functions in the assembly source file, the assembler leaves the addresses of the external functions undefined, to be filled in later by the linker. The assembler can be invoked with the following command line: 
 
$ as hello.s -o hello.o

As with GCC, the output file is specified with the -o option. The resulting file ‘hello.o’ contains the machine instructions for the Hello World program, with an undefined reference to printf.  

11.5 The linker

The final stage of compilation is the linking of object files to create an executable. In practice, an executable requires many external functions from system and C run-time (crt) libraries. Consequently, the actual link commands used internally by GCC are complicated. For example, the full command for linking the Hello World program is: 

$ ld -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o 
 /usr/lib/crti.o /usr/lib/gcc-lib/i686/3.3.1/crtbegin.o 
 -L/usr/lib/gcc-lib/i686/3.3.1 hello.o -lgcc -lgcc_eh
 -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i686/3.3.1/crtend.o 
 /usr/lib/crtn.o

Fortunately there is never any need to type the command above directly--the entire linking process is handled transparently by gcc when invoked as follows: 

$ gcc hello.o 

This links the object file ‘hello.o’ to the C standard library, and produces an executable file ‘a.out’:
 
$ ./a.out 
Hello, world!

An object file for a C++ program can be linked to the C++ standard library in the same way with a single g++ command.

No comments:

Post a Comment