==========================================================
Submitter: Kayvan Memarian and Peter Sewell
Submission Date: 2016-09-22
Document: WG14 N2091
Related: Section 1 of N2012, Question 2/15 of our survey Section 3.1 (Q47-48) of our N2013, and DR338.
This document is based on N2012 (Section 1), adding a concrete Technical Corrigendum proposal for discussion and revising the text.
In ISO C11 (following C99) trap representations are particular object representations that do not represent values of the object type, for which merely reading a trap representation (except by an lvalue of character type), is undefined behaviour. See 3.19.4, 6.2.6.1p5, 6.2.6.2p2, DR338. An "indeterminate value" is either a trap representation or an unspecified value.
Trap representations complicate the language and they create the possibility for subtle bugs, e.g. if a programmer inadvertently constructs a trap representation and the resulting (unbounded) undefined behaviour is exploited by some unexpected compiler optimisation. It is not clear how they are significant they are in practice for current C implementations:
For most integer types it appears not: 6.2.6.1p5 makes clear that trap representations are particular concrete bit patterns, and in the most common implementations (which are 2's-complement and for most types use all the bits) there are no representation values that do not represent an abstract integer value. The only exception we are aware of is _Bool
(as observed by Joseph Myers wrt GCC). It seems one could either (a) take non {0,1} values to be trap representations in the current sense, or (b) regard operations on non {0,1} values of that type as giving an unspecified value. The latter would bound possible misbehaviour, which would be good for programmers; the only possible downside we are aware of is that it could limit compilation via computed branch tables indexed by unchecked _Bool
values.
There is the case of floating-point "Signalling NaNs" (again highlighed by Joseph Myers, esp. wrt x86 Extended Precision). This seems the most plausible case, but in many environments they seem to be disabled, and in others it seems that users would want IEEE behaviour, not undefined behaviour. As far as we can tell so far, these trap only when used, not when read, so one could again (b) regard operations on them as nondeterministically giving an unspecified value or trapping (bounding bad behaviour) rather than (a) regarding those representations as trap representations. Or, even more limited, they just get turned into quiet NaNs, in which case they could be modelled by unspecified values.
There has been much discussion of the Itanium NaT flag, but that is not a memory-value-representable entity, so it is orthogonal to trap representations. Our impression (after email with Hans Boehm) is that this may need all reads of uninitialised values (perhaps except padding) to nondeterministically give either an unspecified value or trap. It seems this is probably independent of whether the address of the read variable is taken, leaving the point of that clause of the standard unclear.
There is the possibility of pointers in segmented architectures in which reading a pointer value does some dynamic check. Derek Jones reports that 68000 does this and still exists - though as it is no longer manufactured, its not clear that future C standards should take it into account.
In an ideal world we'd remove trap representations from the standard, following (a) for the first two bullets above; we'd then also be able to remove the concept of "indeterminate value", leaving just "unspecified value".
If that is not feasible, e.g. because of the segmented-architecture pointer case, we suggest making the sets of trap representation values for each type be implementation-defined, thereby requiring implementations to document which representations are trap representations - and hence, in the common case that there are none, to document that. That will simplify the task of reasoning about C programs in that case, removing the uncertainty about whether an implementation might be treating some representation values as trap representations.
5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.
to read
5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation. The set of trap representations for each object type is an implementation-defined set.
_Bool
To replace the use of trap representations for non {0,1} _Bool
values by unspecified values for the result of operations on such values, one could make a change as below. Such values are converted (by the integer promotion rules) to other integer types before they are operated on, so the unspecified value can be introduced just at the conversion point (and then propagated as in n2089 by the operations).
Extend 6.3.1.2 Boolean type from
1 When any scalar value is converted to
_Bool
, the result is0
if the value compares equal to0
; otherwise, the result is1
.59)
to:
1 When any scalar value is converted to
_Bool
, the result is0
if the value compares equal to0
; otherwise, the result is1
.59) When a value of_Bool
type is converted to any other scalar type, if the value is not0
or1
the result is an unspecified value.