Tuesday, January 10, 2012

stl remove algorithm

stl remove algorithm

The remove() algorithm removes elements from a range. However, if you use it for all elements of a container it operates in a surprising way. Consider the following example:

Someone reading this program without deeper knowledge would expect that all elements with value 3 are removed from the collection. However, the output of the program is as follows:

pre: 6 5 4 3 2 1 1 2 3 4 5 6
post: 6 5 4 2 1 1 2 4 5 6 5 6

Thus, remove() did not change the number of elements in the collection for which it was called. The end() member function returns the old end, whereas size() returns the old number of elements. However, something has changed: The elements changed their order as if the elements were removed. Each element with value 3 was overwritten by the following elements (Figure below).
At the end of the collection, the old elements that were not overwritten by the algorithm remain unchanged. Logically, these  elements no longer belong to the collection.

Figure: How remove() Operates
However, the algorithm does return the new end. By using it, you can access the resulting range, reduce the size of the collection, or process the number of removed elements. Consider the following modified version of the example:

In this version, the return value of remove() is assigned to the iterator end:

list<int>::iterator end = remove (coll.begin(), coll.end(),3);

This is the new logical end of the modified collection after the elements are "removed." You can use this return value as the new end for further operations:

copy (coll.begin(), end,
ostream_iterator<int>(cout," "));

Another possibility is to process the number of "removed" elements by processing the distance between the "logical" and the real ends of the collection:

cout << "number of removed elements: "
<< distance(end,coll.end()) << endl;

Here, a special auxiliary function for iterators, distance(), is used. It returns the distance between two iterators. If the iterators were random access iterators you could process the difference directly with operator -. However, the container is a list, so it provides only bidirectional iterators.

Note: The definition of distance() has changed, so in older STL versions you must include this version of distance(),

template <class Iterator>
inline long distance (Iterator pos1, Iterator pos2)
{
long d = 0;
distance (pos1, pos2, d);
return d;

If you really want to remove the "removed" elements, you must call an appropriate member function of the container. To do this, containers provide the erase() member function, erase() removes all elements of the range that is specified by its arguments:

coll.erase (end, coll.end());

Here is the output of the whole program:
6 5 4 3 2 1 1 2 3 4 5 6
6 5 4 2 1 1 2 4 5 6
number of removed elements: 2
6 5 4 2 1 1 2 4 5 6

If you really want to remove elements in one statement, you can call the following statement:

coll.erase (remove(coll.begin(),coll.end(),3),
coll.end());

Why don't algorithms call erase() by themselves? Well, this question highlights the price of the flexibility of the STL. The STL separates data structures and algorithms by using iterators as the interface. However, iterators are an abstraction to represent a position in a container. In general,iterators do not know their containers. Thus, the algorithms, which use the iterators to access the elements of the container, can't call any member function for it.

This design has important consequences because it allows algorithms to operate on ranges that are different from "all elements of a container." For example, the range might be a subset of all elements of a collection. And, it might even be a container that provides no erase() member function (ordinary arrays are an example of such a container). So, to make algorithms as flexible as possible, there are good reasons not to require that iterators know their container.

Note that it is often not necessary to remove the "removed" elements. Often, it is no problem to use the returned new logical end instead of the real end of the container. In particular, you can call all algorithms with the new logical end.

See Also: