Tuesday, January 21, 2014

c++ stl algorithm

c++ stl algorithm : Standard Template Library Tutorial, ccplusplus.com
The STL provides several standard algorithms for the processing of elements of collections. These algorithms offer general fundamental services, such as searching, sorting, copying, reordering, modifying, and numeric processing.

Algorithms are not member functions of the container classes. Instead, they are global functions that operate with iterators. This has an important advantage: Instead of each algorithm being implemented for each container type, all are implemented only once for any container type. The algorithm might even operate on elements of different container types. You can also use the algorithms for user-defined container types. All in all, this concept reduces the amount of code and increases the power and the flexibility of the library.

Note that this is not an object-oriented programming paradigm; it is a generic functional programming paradigm. Instead of data and operations being unified, as in object-oriented programming, they are separated into distinct parts that can interact via a certain interface. However, this concept also has its price: First, the usage is not intuitive. Second, some combinations of data structures and algorithms might not work. Even worse, a combination of a container type and an algorithm might be possible but not useful (for example, it may lead to bad performance). Thus, it is important to learn the concepts and the pitfalls of the STL to benefit from it without abusing it. I provide examples and more details about this throughout the rest of this post.

Let's start with a simple example of the use of STL algorithms. Consider the following program, which shows some algorithms and their usage:

To be able to call the algorithms, you must include the header file <algorithm>:

The first two algorithms called are min_element() and max_element(). They are called with two parameters that define the range of the processed elements. To process all elements of a container you simply use begin() and end(). Both algorithms return an iterator for the minimum and maximum elements respectively. Thus, in the statement

the min_element() algorithm returns the position of the minimum element (if there is more than one, the algorithm returns the first). The next statement prints that element:

Of course, you could do both in one statement:

The next algorithm called is sort(). As the name indicates, it sorts the elements of the range defined by the two arguments. As usual, you could pass an optional sorting criterion. The default sorting criterion is operator <. Thus, in this example all elements of the container are sorted in ascending order:

So afterward, the vector contains the elements in this order:

The find() algorithm searches for a value inside the given range. In this example, it searches the first element that is equal to the value 3 in the whole container:

If the find() algorithm is successful, it returns the iterator position of the element found. If it fails, it returns the end of the range, the past-the-end iterator, which is passed as the second argument. In this example, the value 3 is found as the third element, so afterward pos refers to the third element of coll.

The last algorithm called in the example is reverse(), which reverses the elements of the passed range. Here the third element that was found by the find() algorithms and the past-the-end iterator are passed as arguments:

This call reverses the order of the third element up to the last one. The output of the program is as follows:


All algorithms process one or more ranges of elements. Such a range might, but is not required to, embrace all elements of a container. Therefore, to be able to handle subsets of container elements, you pass the beginning and the end of the range as two separate arguments rather than the whole collection as one argument.

This interface is flexible but dangerous. The caller must ensure that the first and second arguments define a valid range. This is the case if the end of the range is reachable from the beginning by iterating through the elements. This means, it is up to the programmer to ensure that both iterators belong to the same container and that the beginning is not behind the end. If this is not the case, the behavior is undefined and endless loops or forbidden memory access may result. In this respect, iterators are just as unsafe as ordinary pointers. But note that undefined behavior also means that an implementation of the STL is free to find such kinds of errors and handle them accordingly. The following paragraphs show that ensuring that ranges are valid is not always as easy as it sounds.

Every algorithm processes half-open ranges. Thus, a range is defined so that it includes the position used as the beginning of the range but excludes the position used as the end. This concept is often described by using the traditional mathematical notations for half-open ranges:
We use the first alternative in this series

The half-open range concept has the advantages that were described on page 84 (it is simple and avoids special handling for empty collections). However, it also has some disadvantages. Consider the following example:

In this example, the collection is initialized with integral values from 20 to 40. When the search for an element with the value 3 fails, find() returns the end of the processed range (coll.end() in this example) and assigns it to pos. Using that return value as the beginning of the range in the following call of reverse() poses no problem because it results in the following call:

This is simply a call to reverse an empty range. Thus, it is an operation that has no effect (a so-called "no-op").

However, if find() is used to find the first and the last elements of a subset, you should consider that passing these iterator positions as a range will exclude the last element. So, the first call of max_element()

finds 34 and not 35:

To process the last element, you have to pass the position that is one past the last element:

Doing this yields the correct result:

Note that this example uses a list as the container. Thus, you must use operator ++ to get the position that is behind pos35. If you have random access iterators, as with vectors and deques, you also could use the expression pos35 + 1. This is because random access iterators allow "iterator arithmetic".

Of course, you could use pos25 and pos35 to find something in that subrange. Again, to search including pos35 you have to pass the position after pos35. For example:

All the examples in this section work only because you know that pos25 is in front of pos35. Otherwise, [pos25,pos35) would not be a valid range. If you are not sure which element is in front, things are getting more complicated and undefined behavior may occur.

Suppose you don't know whether the element that has value 25 is in front of the element that has value 35. It might even be possible that one or both values are not present. By using random access iterators, you can call operator < to check this:

However, without random access iterators you have no simple, fast way to find out which iterator is in front. You can only search for one iterator in the range of the beginning to the other iterator or in the range of the other iterator to the end. In this case, you should change your algorithm as follows: Instead of searching for both values in the whole source range, you should try to find out, while searching for them, which value comes first. For example:

In contrast to the previous version, here you don't search for pos35 in the full range of all elements of coll. Instead, you first search for it from the beginning to pos25. Then, if it's not found, you search for it in the part that contains the remaining elements after pos25. As a result you know which iterator position comes first and which subrange is valid.

This implementation is not very efficient. A more efficient way to find the first element that either has value 25 or value 35 is to search exactly for that. You could do this by using some abilities of the STL that are not introduced yet as follows:

Here, a special expression is used as a sorting criterion that allows a search of the first element that has either value 25 or value 35.

Handling Multiple Ranges

Several algorithms process more than one range. In this case you usually must define both the beginning and the end only for the first range. For all other ranges you need to pass only their beginnings. The ends of the other ranges follow from the number of elements of the first range. For example, the following call of equal() compares all elements of the collection coll1 element-by-element with the elements of coll2 beginning with its first element:

Thus, the number of elements of coll2 that are compared with the elements of coll1 is specified indirectly by the number of elements in coll1.

This leads to an important consequence: When you call algorithms for multiple ranges, make sure the second and additional ranges have at least as many elements as the first range. In particular, make sure that destination ranges are big enough for algorithms that write to collections!
Consider the following program:

Here, the copy() algorithm is called. It simply copies all elements of the first range into the destination range. As usual, for the first range, the beginning and the end are defined, whereas for the second range, only the beginning is specified. However, the algorithm overwrites rather than inserts. So, the algorithm requires that the destination has enough elements to be overwritten. If there is not enough room, as in this case, the result is undefined behavior. In practice, this often means that you overwrite whatever comes after the coll2.end(). If you're in luck, you'll get a crash; at least then you'll know that you did something wrong. However, you can force your luck by using a safe version of the STL for which the undefined behavior is defined as leading to a certain error handling procedure.

To avoid these errors, you can (1) ensure that the destination has enough elements on entry, or (2) use insert iterators. Insert iterators are covered incoming posts, We'll first explain how to modify the destination so that it is big enough on entry.

To make the destination big enough, you must either create it with the correct size or change its size explicitly. Both alternatives apply only to sequence containers (vectors, deques, and lists). This is not really a problem because associative containers cannot be used as a destination for purposes for overwriting algorithms. The following program shows how to increase the size of containers:

Here, resize() is used to change the number of elements in the existing container coll2:

coll3 is initialized with a special initial size so that it has enough room for all elements of coll1:

Note that both resizing and initializing the size create new elements. These elements are initialized by their default constructor because no arguments are passed to them. You can pass an additional argument both for the constructor and for resize() to initialize the new elements.

See Also:


No comments:

Post a Comment