Thursday, June 23, 2011

Bitwise Operators in C

Introduction


When you learn to program in a high-level language like C (although C is fairly low-level, as high-level languages go), the idea is to avoid worrying too much about the hardware. You want the ability to represent mathematical abstractions, such as sets, etc. and have high level language features like threads, higher-order functions, exceptions.

High-level languages, for the most part, try to make you as unaware of the hardware as possible. Clearly, this isn't entirely true, because efficiency is still a major consideration for some programming languages.

C, in particular, was created to make it easier to write operating systems. Rather than write UNIX in assembly, which is slow process (because assembly code is tedious to write), and not very portable (because assembly is specific to an ISA), the goal was to have a language that provided good control-flow, some abstractions (structures, function calls), and could be efficiently compiled and run quickly. Writing operating systems requires the manipulation of data at addresses, and this requires manipulating individual bits or groups of bits.

That's where two sets of operators are useful: bitwise operators and bitshift operators.

You can find these operators in C, C++, and Java (and presumably C#, since it's basically Java). Bitwise operators allow you to read and manipulate bits in variables of certain types.
Even though such features are available in C, they aren't often taught in an introductory level programming course. That's because intro level courses prefer to emphasize abstraction. With many departments using Java, there's a trend to increase what's abstract, and not get into the representation. For example, some languages have support for stacks, queues, hashtables, and so forth. These "canned" data structures are meant to provide you, the programmer, with objects that perform certain tasks, while relieving you of the tedium and detail of understanding how the data is represented. Nevertheless, if you intend to do some work in systems programming, or other forms of low-level coding (operating systems, device drivers, socket programming, network programming), knowing how to access and manipulate bits is important.

Generic Bitwise Operations

Bitwise operators only work on a limited number of types: int and char. This seems restrictive--and it is restrictive, but it turns out we can gain some flexibility by doing some C "tricks".
It turns out there's more than one kind of int. In particular, there's unsigned int, there's short int, there's long int, and then unsigned versions of those ints. The "C" language does not specify the difference between a short int, an int and a long int, except to state that: 

sizeof( short int ) <= sizeof( int ) <= sizeof( long )

You will find that these sizes vary from ISA to ISA, and possibly even compiler to compiler. The sizes do not have to be distinct. That means all three sizes could be the same, or two of three could be the same, provided that the above restrictions are held.
Bitwise operators fall into two categories: binary bitwise operators and unary bitwise operators. Binary operators take two arguments, while unary operators only take one.
We're going to discuss binary operators first, and we'll do in the context of a fake binary operator called @. Thus, we write x @ y to perform bitwise @ on x and y.
Bitwise operators, like arithmetic operators, do not change the value of the arguments. Instead, a temporary value is created. This can then be assigned to a variable. For example, when you write x + y, neither x nor y have their value changed.
Let's assume that we are doing bitwise @ on two unsigned int variables. Let's assume that unsigned ints use 32 bits of memory. We define two variables X and Y where

   X = x31x30....x0
   Y = y31y30....y0
We want to be able to refer to each bit of X and Y, which we do so by writing out the bits by writing the variable name in lowercase and adding the appropriate subscript for thie bit number.The subscripts follow the convention we've used so far. The least significant bit has a subscript of 0, and is the rightmost bit. The most significant bit has a subscript of N-1 (for N bits), and is the leftmost bit. In this case N = 32 so the MSb has a subscript of 31.

Let's define @1 to be the @ operation performed on two individual bits. @ is performed on all 32 bits simultaneously. 
Furthermore, let's call the temporary value created, T. Normally, this temporary value does not have a name. It is generated while the program is running, and used in other computations. Thus, when you write c = a + b, a temporary value is created for the sum a + b, and this temporary value then gets copied into c. Of course, for efficiency, the temporary value might not be generated, and the value written directly to c. For more complex statements, such as c = (a + b) - d, temporary values are often created to store intermediate results. These temporary values are created by the runtime system, and are not given variable names.
We define Z = X @ Y as:

   zi = xi @1 yi, for 0 <= i <= N - 1

In words, to compute the ith bit of Z, you should perform the operation on the ith bit of X and ith bit of Y.


Bitwise AND
This makes more sense if we apply this to a specific operator. In C/C++/Java, the & operator is bitwise AND. The following is a chart that defines &1, defining AND on individual bits. 

  xi  
  yi  
xi &1 yi
0
0
0
0
1
0
1
0
0
1
1
1

We can do an example of bitwise &. It's easiest to do this on 4 bit numbers, however. 

Variable
b3
b2
b1
b0
x
1
1
0
0
y
1
0
1
0
z = x & y
1
0
0
0


Bitwise OR
The | operator is bitwise OR (it's a single vertical bar). The following is a chart that defines |1, defining OR on individual bits.
  xi  
  yi  
xi |1 yi
0
0
0
0
1
1
1
0
1
1
1
1

We can do an example of bitwise |. It's easiest to do this on 4 bit numbers, however. 

Variable
b3
b2
b1
b0
x
1
1
0
0
y
1
0
1
0
z = x | y
1
1
1
0


Bitwise XOR
The ^ operator is bitwise XOR. The usual bitwise OR operator is inclusive OR. XOR is true only if exactly one of the two bits is true. The XOR operation is quite interesting, but we defer talking about the interesting things you can do with XOR until the next set of notes.
The following is a chart that defines ^1, defining XOR on individual bits.
  xi  
  yi  
xi ^1 yi
0
0
0
0
1
1
1
0
1
1
1
0

We can do an example of bitwise ^. It's easiest to do this on 4 bit numbers, however. 

Variable
b3
b2
b1
b0
x
1
1
0
0
y
1
0
1
0
z = x ^ y
0
1
1
0


Bitwise NOT
There's only one unary bitwise operator, and that's bitwise NOT. Bitwise NOT flips all of the bits.
There's not that much to say about it, other than it's not the same operation as unary minus.
The following is a chart that defines ~1, defining NOT on an individual bit.
  xi  
~1 xi
0
1
1
0
We can do an example of bitwise ~. It's easiest to do this on 4 bit numbers (although only 2 bits are necessary to show the concept).
Variable
b3
b2
b1
b0
x
1
1
0
0
z = ~x
0
0
1
1


Uses of Bitwise Operations
Occasionally, you may want to implement a large number of Boolean variables, without using a lot of space.
A 32-bit int can be used to stored 32 Boolean variables. Normally, the minimum size for one Boolean variable is one byte. All types in C must have sizes that are multiples of bytes. However, only one bit is necessary to represent a Boolean value.
You can also use bits to represent elements of a (small) set. If a bit is 1, then element i is in the set, otherwise it's not.
You can use bitwise AND to implement set intersection, bitwise OR to implement set union.
In upcoming notes, you'll learn how to find the value of individual bits of a char or int.

Facts About Bitwise Operators
Consider the expression x + y. Do either x or y get modified? The answer is no.
Most built-in binary operators do not modify the values of the arguments. This applies to logical operators too. They don't modify their arguments.
There are operators that do assignment such as +=, -=, *=, and so on. They apply to logical operators too. For example, |=, &=, ^=. Nearly all binary operators have a version with = after it.


Summary
Bitwise operators only work on limited types: int and char (and variations of int). You can, with some casting, make it work on other types. These operators, in conjunction with bitshift operators allow you to access and modify bits.

2 comments:

  1. Thanks nice document but if possible then please provide documentation which explain some tricky solutions of bitwise .

    ReplyDelete
  2. Thanks dear reader,

    We will sure add what you are looking for.

    Regards,
    Saurabh Gupta

    ReplyDelete