Thursday, April 9, 2015

Why are some things left undefined in C++?

Because machines differ and because C left many things undefined. For details, including definitions of the terms “undefined”, “unspecified”, “implementation defined”, and “well-formed”; see the ISO C++ standard. Note that the meaning of those terms differ from their definition of the ISO C standard and from some common usage. You can get wonderfully confused discussions when people don’t realize that not everybody shares definitions.

This is a correct, if unsatisfactory, answer. Like C, C++ is meant to exploit hardware directly and efficiently. This implies that C++ must deal with hardware entities such as bits, bytes, words, addresses, integer computations, and floating-point computations the way they are on a given machine, rather than how we might like them to be. Note that many “things” that people refer to as “undefined” are in fact “implementation defined”, so that we can write perfectly specified code as long as we know which machine we are running on. Sizes of integers and the rounding behavior of floating-point computations fall into that category.

Consider what is probably the the best known and most infamous example of undefined behavior:

The C++ (and C) notion of array and pointer are direct representations of a machine’s notion of memory and addresses, provided with no overhead. The primitive operations on pointers map directly onto machine instructions. In particular, no range checking is done. Doing range checking would impose a cost in terms of run time and code size. C was designed to outcompete assembly code for operating systems tasks, so that was a necessary decision. Also, C – unlike C++ – has no reasonable way of reporting a violation had a compiler decided to generate code to detect it: There are no exceptions in C. C++ followed C for reasons of compatibility and because C++ also compete directly with assembler (in OS, embedded systems, and some numeric computation areas). If you want range checking, use a suitable checked class (vector, smart pointer, string, etc.). A good compiler could catch the range error for a[100] at compile time, catching the one for p[100] is far more difficult, and in general it is impossible to catch every range error at compile time.
Other examples of undefined behavior stems from the compilation model. A compiler cannot detect an inconsistent definition of an object or a function in separately-compiled translation units. For example:

Compiling file1.c and file2.c and linking the results into the same program is illegal in both C and C++. A linker could catch the inconsistent definition of S, but is not obliged to do so (and most don’t). In many cases, it can be quite difficult to catch inconsistencies between separately compiled translation units. Consistent use of header files helps minimize such problems and there are some signs that linkers are improving. Note that C++ linkers do catch almost all errors related to inconsistently declared functions.
Finally, we have the apparently unnecessary and rather annoying undefined behavior of individual expressions. For example:

The value of j is unspecified to allow compilers to produce optimal code. It is claimed that the difference between what can be produced giving the compiler this freedom and requiring “ordinary left-to-right evaluation” can be significant. Leading experts are unconvinced, but with innumerable compilers “out there” taking advantage of the freedom and some people passionately defending that freedom, a change would be difficult and could take decades to penetrate to the distant corners of the C and C++ worlds. It is disappointing that not all compilers warn against code such as ++i+i++. Similarly, the order of evaluation of arguments is unspecified.

There is a sentiment that too many “things” are left undefined, unspecified, implementation-defined, etc. To address this, the ISO C++ committee has created Study Group 12 to review and recommend wide-ranging tightening-up to reduce undefined, unspecified, and implementation-defined behavior.

No comments:

Post a Comment