==========================================================
Document: N2220
Related: N2223: Clarifying the C Memory Object Model: Introduction to N2219 - N2222, N2091, Section 1 of N2012, Question 2/15 of our survey, Section 3.1 (Q47-48) of our N2013, and DR338.
This document revises N2091, which itself was based on N2012 (Section 1), adding a concrete Technical Corrigendum proposal for discussion and revising the text.
In ISO C11 (following C99) trap representations are particular object representations that do not represent values of the object type, for which merely reading a trap representation (except by an lvalue of character type), is undefined behaviour. See 3.19.4, 6.2.6.1p5, 6.2.6.2p2, DR338. An "indeterminate value" is either a trap representation or an unspecified value.
Trap representations complicate the language: misconceptions and misunderstandings about trap reprepresentations and their relationship to unspecified values seem common, e.g. with confusion between the ISO notion that trap representations give UB when read and the idea that they give machine traps when read, confusion with the quite different Itanium NaT concept, and with some believing that object types that do not have any unused representations should nonetheless be regarded as potentially having trap representations. They create the possibility for subtle bugs, e.g. if a programmer inadvertently constructs a trap representation and the resulting (unbounded) undefined behaviour is exploited by some unexpected compiler optimisation. It is not clear how they are significant they are in practice for current C implementations:
For most integer types it appears that trap representations are not significant: 6.2.6.1p5 makes clear that trap representations are particular concrete bit patterns, and in the most common implementations (which are 2's-complement and for most types use all the bits) there are no representation values that do not represent an abstract integer value. The only exception we are aware of is _Bool
(as observed by Joseph Myers wrt GCC). It seems one could either (a) take non {0,1} values to be trap representations in the current sense, or (b) regard operations on non {0,1} values of that type as giving an unspecified value. The latter would bound possible misbehaviour, which would be good for programmers; the only possible downside we are aware of is that it could limit compilation via computed branch tables indexed by unchecked _Bool
values.
One might think that floating-point "Signalling NaNs" should be regarded as trap representations (again highlighed by Joseph Myers, esp. wrt x86 Extended Precision), but as far as we can tell, that is not the case. Signalling NaNs appear to raise a floating-point exception only when used, not when read, and that exception is not undefined behaviour.
There has been much discussion of the Itanium NaT flag, but that is not a memory-value-representable entity, so it is orthogonal to trap representations. Our impression (after email with Hans Boehm) is that a sound treatment of Itanium NaT might need all reads of uninitialised values (perhaps except padding) to nondeterministically give either an unspecified value or trap. We are not trying to capture that in this proposal, as it seems a radical change that would not be consistent with how C is implemented and used on other architectures. In any case, it seems this is probably independent of whether the address of the read variable is taken, per 6.3.2.1p2, leaving the point of that clause of the standard unclear. (The relevance of Itanium for future major revisions might also be debatable - though we have no opinion on that. HPE has said it will keep support for Itanium servers until 2025.)
There is the possibility of pointers in segmented architectures in which reading a pointer value does some dynamic check. Derek Jones reported that original 68000 did this. It would be useful to know if there are current implementations that do.
We see two options here for the next major version of C.
Ideally we think the concept of trap representation should be removed entirely, following Option (a) below; this requires adapting the treatment of _Bool
somewhat, exploiting the unspecified-value semantics. If that is not feasible (e.g. because of unchecked computed branch tables involving _Bool
, or NaN issues, or exotic architectures), we propose to keep trap representations but require them be implementation-defined, Option (b).
In either case we suggest removing the 6.3.2.1p2 clause, to make uninitialised reads (of non-trap-representations for (b)) defined behaviour irrespective of whether the address of the read variable is taken.
_Bool
representations with unspecified valuesTo replace the use of trap representations for non-{true,false} _Bool
values, by unspecified values for the result of operations on such values, one could make a change as below. Such values are converted (by the integer promotion rules) to other integer types before they are operated on, so the unspecified value can be introduced just at the conversion point (and then propagated as in our N2221 proposal by the operations).
Extend 6.3.1.2 Boolean type from
1 When any scalar value is converted to
_Bool
, the result is0
if the value compares equal to0
; otherwise, the result is1
.59)
to:
1 When any scalar value is converted to
_Bool
, the result is0
if the value compares equal to0
; otherwise, the result is1
.59) When a value of_Bool
type is converted to any other scalar type, if the value is not0
or1
the result is an unspecified value.
Note that if one were also making control-flow choices based on unspecified values be undefined behaviour (a separate semantic choice, Q50 of N2221), this would also make unchecked _Bool
computed branch tables a sound implementation technique.
Removing trap representations entirely would also let one remove the concept of indeterminate value.
Here we suggest making the sets of trap representation values for each type be implementation-defined, thereby requiring implementations to document which representations are trap representations - and hence, in the common case that there are none for non-_Bool
integer types or for pointer types, to document that. That will simplify the task of reasoning about C programs in that case, removing the uncertainty about whether an implementation might be treating some representation values as trap representations.
5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.
to read
5 Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation. The set of trap representations for each object type is an implementation-defined set.