Thursday, December 8, 2011

stl range

Ranges in stl

All algorithms process one or more ranges of elements. Such a range might, but is not required to, embrace all elements of a container. Therefore, to be able to handle subsets of container elements, you pass the beginning and the end of the range as two  separate arguments rather than the whole collection as one argument.

This interface is flexible but dangerous. The caller must ensure that the first and second arguments define a valid range. This is the case if the end of the range is reachable from the beginning by iterating through the elements. This means, it is up to the programmer to ensure that both iterators belong to the same container and that the beginning is not behind the end. If this is not the case, the behavior is undefined and endless loops or  forbidden memory access may result. In this respect, iterators are just as unsafe as ordinary pointers. But note that undefined behavior also means that an implementation of the STL is free to find such kinds of errors and handle them accordingly. The following paragraphs show that ensuring that ranges are valid is not always as easy as it sounds. 

Every algorithm processes half-open ranges. Thus, a range is defined so that it includes the position used as the beginning of the range but excludes the position used as the end. This concept is often described by using the traditional mathematical notations for half-open ranges: 

[begin,end)
or
[begin,end[

I use the first alternative in this book.

The half-open range concept has the advantages that were described here(it is simple and avoids special handling for empty collections). However, it also has some disadvantages.  Consider the following example:


In this example, the collection is initialized with integral values from 20 to 40. When the search for an element with the value 3 fails, find() returns the end of the processed range (coll.end() in this example) and assigns it to pos. Using that return value as the beginning of the range in the following call of reverse() poses no problem because it results in the following call:

reverse (coll.end(), coll.end());

This is simply a call to reverse an empty range. Thus, it is an operation that has no effect (a socalled "no-op").

However, if find() is used to find the first and the last elements of a subset, you should consider that passing these iterator positions as a range will exclude the last element. So, the first call of max_element()

max_element (pos25, pos35)

finds 34 and not 35:

max: 34

To process the last element, you have to pass the position that is one past the last element:

max_element (pos25, ++pos35)

Doing this yields the correct result:

max: 35

Note that this example uses a list as the container. Thus, you must use operator ++ to get the position that is behind pos35. If you have random access iterators, as with vectors and deques, you also could use the expression pos35 + 1. This is because random access iterators allow "iterator arithmetic.

Of course, you could use pos25 and pos35 to find something in that subrange. Again, to search including pos35 you have to pass the position after pos35. For example:

//increment pos35 to search with its value included
++pos35;
pos30 = find(pos25,pos35, //range
30); //value
if (pos30 == pos35) {
   cout << "30 is in NOT the subrange" << endl;
}
else {
   cout << "30 is in the subrange" << endl;
}

All the examples in this section work only because you know that pos25 is in front of pos35.

Otherwise, [pos25,pos35) would not be a valid range. If you are not sure which element is in front, things are getting more complicated and undefined behavior may occur.

Suppose you don't know whether the element that has value 25 is in front of the element that has value 35. It might even be possible that one or both values are not present. By using random access iterators, you can call operator < to check this:

if (pos25 < pos35) {
   //only [pos25,pos35) is valid
   ...
}
else if (pos35 < pos25) {
   //only [pos35,pos25) is valid
   ...
}
else {
   //both are equal, so both must be end()
   ...
}

However, without random access iterators you have no simple, fast way to find out which iterator is in front. You can only search for one iterator in the range of the beginning to the other iterator or in the range of the other iterator to the end. In this case, you should change your algorithm as follows: Instead of searching for both values in the whole source range, you should try to find out, while searching for them, which value comes first. For example:

pos25 = find (coll.begin(), coll.end(), //range
25); //value
pos35 = find (coll.begin(), pos25, //range
35); //value
if (pos35 != pos25) {
 /*
  * pos35 is in front of pos25
  * so, only [pos35,pos25) is valid
  */
  ...
}
else {
   pos35 = find (pos25, coll.end(), //range
   35); //value
   if (pos35 != pos25) {
  /*pos25 is in front of pos35
   *so, only [pos25,pos35) is valid
   */
   ...
}
else {
   // both are equal, so both must be end()
   ...
}
}
In contrast to the previous version, here you don't search for pos35 in the full range of all elements of coll. Instead, you first search for it from the beginning to pos25. Then, if it's not found, you search for it in the part that contains the remaining elements after pos25. As a result you know which iterator position comes first and which subrange is valid.

This implementation is not very efficient. A more efficient way to find the first element that either has value 25 or value 35 is to search exactly for that. You could do this by using some abilities of the STL that are not introduced yet as follows:

pos = find_if (coll.begin(), coll.end(), //range
compose_f_gx_hx(logical_or<bool>(), //criterion
bind2nd(equal_to<int>(), 25),
bind2nd(equal_to<int>(), 35)));
switch (*pos) {
   case 25:
      //element with value 25 comes first
      pos25 = pos;
      pos35 = find (++pos, coll.end(), //range
      35); //value
      ...
      break;
   case 35:
      //element with value 35 comes first
      pos35 = pos;
      pos25 = find (++pos, coll.end(), //range
      25); //value
      ...
      break;
   default:
      //no element with value 25 or 35 found
      ...
      break;
}
Here, a special expression is used as a sorting criterion that allows a search of the first element that has either value 25 or value 35. The expression is a combination of several predefined function objects.

See Also:


No comments:

Post a Comment