A race condition is where the behavior of code depends on the interleaving of multiple threads. This is perhaps the most fundamental problem with multi-threaded programming.
When analyzing or writing single-threaded code we only have to think about the sequence of statements right in front of us; we can assume that data will not magically change between statements. However, with improperly written multi-threaded code non-local data can change unexpectedly due to the actions of another thread.
Race conditions can result in a high-level logical fault in your program, or (more excitingly) it may even pierce C++'s statement-level abstraction. That is, we cannot even assume that single C++ statements execute atomically because they may compile to multiple assembly instructions. In short, this means that we cannot guarantee the outcome of a statement such as
foo += 1;
foois non-local and may be accessed from multiple threads.
A contrived example follows.
Listing 1. A logical race condition
Now imagine that we start a number of threads, all executing
workerThread(). If we have just one thread,
doSomeWork()is going to be executed the correct number of times (whatever
sharedCounterstarts out at).
However, with more than one thread
doSomeWork()will most likely be executed too many times. Exactly how many times depends on the number of threads spawned, computer architecture, operating system scheduling and... chance. The problem arises because we do not test and update
sharedCounteras an atomic operation, so there is a period where the value of
sharedCounteris incorrect. During this time other threads can pass the test when they really shouldn't have.
The value of
sharedCounteron exit tells us how many extra times
doSomeWork()is called. With a single thread, the final value of
sharedCounteris of course 0. With multiple threads running, it will be between 0 and -N where N is the number of threads.
Moving the update adjacent to the test will not make these two operations atomic. The window during which
sharedCounteris out of date will be smaller, but the race condition remains. An illustration of this non-solution follows:
Listing 2. Still a race condition
The solution is to use a mutex to synchronize the threads with respect to the test and update. Another way of saying this is that we need to define a critical section in which we both test and update the
sharedCounter. The next section introduces mutex and solves the example race condition.