JTC1/SC22/WG14
N1008
Sequence points and related issues
==================================
SC22/WG14/N1008
Clive Feather
<clive@demon.net>
Last changed 2003-04-30
Introduction
This paper describes a "Consensus Model" for addressing the issues of
sequence points and the evaluation of expressions. It then points out
the major problems with the current wording of the Standard and proposes
changes.
The Consensus Model
While there is still debate on many aspects of these matters, consensus
appears to have emerged on certain features of the model. This "consensus
model" works thus.
Evaluation of an expression implies the presence of one or more "events";
the consensus model does not cover the details of what events exist or how
they relate to the expression (N925 is an example of such a model that is
compatible with the consensus model), but it is agreed that the significant
ones are "read" and "write" events and sequence points. For example, the
expression "x = y + z" involves read events for y and z and a write event
for x. The order that events occur in is generally unspecified, but is
constrained in two important ways:
(1) causality - in "x = y + z", the write event for x must come after the
two read events;
(2) sequencing - where two events are required by the Standard to come in a
particular order; for example, in "x = 0, y = 0", the write event for
x must come before that for y.
Because the latter is often described by the Standard in terms of sequence
points, these become a third significant type of event.
Having identified the events, it is necessary (in principle) to examine
every possible ordering that is consistent with the above constraints.
* If any possible ordering could cause two write events, or a write event
followed by a read event, for the same object to occur with no
intervening sequence point, the behaviour is undefined. Note that:
- this does not apply to two read events, or when the read event precedes
the write event;
- the requirement is on the programmer to ensure there is no such
ordering possible, not on the implementation to choose a different
order.
* Otherwise the expression is legitimate. However, if two write events, or
a read event and a write event, for the same object can occur either way
round, it is unspecified which of the two values written is the final
value of the object, or whether the value read is the one written or is
the previous value; this may in turn affect other results.
As an example of the first rule, the expression "x = x++" involves two
write events for x and no contraints on their relative order. Therefore
the behaviour is undefined. Similarly, "x++ * -x" involves two read events
for x and one write event; while the read event associated with "x++"
must come before the write event, that associated with "-x" need not and so
the behaviour is undefined. On the other hand, "x++" is legitimate because
the write event must be after the read event, and "x = 1, x = 2" is
legitimate because the sequence point separates the two writes.
The rationale behind these rules is based on optimisation and parallel
execution. When re-organising code, an optimiser may well want to move a
read to an earlier point than it is absolutely needed, or save a generated
value in a register and move the write to a later point. Sequence points
represent the limits of how such moves can affect the behaviour of code
(the optimiser can move the read or write further, but must produce results
that are "as if" it had not). "Parallel execution" refers not just to
systems with more than one processor, but to any arrangement (such as
pipelining or auto-spilling of registers) where two actions can be
happening at once. Where the same memory location is accessed through two
different paths at the same time, these systems may not guarantee the
results - for example, the value actually stored might be the bitwise-OR
of the two values being written. Similarly, if an object consists of more
than one bus-width of data, the two writes might be interleaved in an
unpredictable manner.
Standard text
The Standard has the following to say about sequence points in general:
5.1.2.3:
[#2...] At certain specified
points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken
place.
[#4] When the processing of the abstract machine is
interrupted by receipt of a signal, only the values of
objects as of the previous sequence point may be relied on.
Objects that may be modified between the previous sequence
point and the next sequence point need not have received
their correct values yet.
[#5] The least requirements on a conforming implementation
are:
-- At sequence points, volatile objects are stable in the
sense that previous accesses are complete and
subsequent accesses have not yet occurred.
6.5:
[#1] An expression is a sequence of operators and operands
that specifies computation of a value, or that designates an
object or a function, or that generates side effects, or
that performs a combination thereof.
[#2] Between the previous and next sequence point an object
shall have its stored value modified at most once by the
evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.70)
[#3] The grouping of operators and operands is indicated by
the syntax.71) Except as specified later (for the function-
call (), &&, ||, ?:, and comma operators), the order of
evaluation of subexpressions and the order in which side
effects take place are both unspecified.
6.8:
[#2] A statement specifies an action to be performed.
Except as indicated, statements are executed in sequence.
7.1.4:
156 Such macros might not contain the sequence points that
the corresponding function calls do.
and, as an example of how they are used in specifying execution sequence:
6.5.2.4:
[#2...] The side effect of updating the
stored value of the operand shall occur between the previous
and the next sequence point.
Problems with the Standard text
The problems with the above extracts can be summarised as follows:
(1) It is unclear that the Consensus Model can be derived from the above.
(2) In particular, it is unclear how to apply terms such as "previous
sequence point" to expressions like "x + (y, z)", where a sub-expression
contains a sequence point but the whole expression involves an unspecified
order of evaluation.
(3) A strict reading of the wording "shall be read only to determine the
value to be stored" would prevent the value being used for any other
purpose as well, such as determining the value to be stored in another
object.
(4) Though WG14 has stated that "statements are executed in sequence"
means that the execution of a function call is atomic with respect to the
surrounding expression (that is, the events of the expression all occur
strictly before or strictly after the events of the function call), it is
far from clear how to derive this from the Standard.
Wording proposals
These proposals are not in order in the Standard, but rather in conceptual
order.
First we need to make it clear that sequence points are not global, but
must be read in context.
5.1.2.3:
[#2...] At certain specified
points in the execution sequence called sequence points, all
side effects of previous evaluations shall be complete and
no side effects of subsequent evaluations shall have taken
place.
+ Note that some evaluations are unordered with respect to a given
+ sequence point and, unless stated otherwise, may therefore occur
+ before or after that sequence point.
Next we address the basic rules for when access to an object in an
expression involves undefined behaviour. This is done by introducing
the concept of "collateral expressions", which are expressions whose
relative order is not specified and whose sequence points do not affect
one another. We change the first three paragraphs of 6.5:
6.5:
[#1] An expression is a sequence of operators and operands
that specifies computation of a value, or that designates an
object or a function, or that generates side effects, or
that performs a combination thereof.
* [#2] The precedence of operators and operands is indicated by
the syntax.71) Except as specified later (for the function-
call (), &&, ||, ?:, and comma operators), the order of
evaluation of subexpressions and the order in which side
effects take place are both unspecified.
+ Where the order of evaluation of two expressions is
+ unspecified, they are /collateral expressions/. If E1 and E2
+ are collateral subexpressions and E3 is a subexpression of E2,
+ then E1 and E3 are also collateral expressions (this
+ definition applies recursively).
* [#3] If an object has its stored value (or two overlapping
* objects have the overlapping part of their stored values)
* accessed by two expressions, and one of the accesses modifies
* the value, then the expressions shall not be collateral
* expressions (but one may be within a function called by a
* collateral expression of the other; see 6.5.2.2). Furthermore,
* either the accesses shall be separated by a sequence point or
* else the other access shall be a read and
the prior value
shall be read only to determine the value to be stored.70)
(this last part will be re-addressed later on).
There are a couple of places where multiple expressions occur and it is
important to add wording to clarify which are collateral expressions.
These are array declarations in 6.7.5.2:
[#5] If the size is an expression that is not an integer
constant expression: if it occurs in a declaration at
function prototype scope, it is treated as if it were
replaced by *; otherwise, each time it is evaluated it shall
have a value greater than zero.
+ All of the size expressions in a full declarator that are
+ evaluated are collateral expressions of each other.
The size of each instance
of a variable length array type does not change during its
lifetime. Where a size expression is part of the operand of
a sizeof operator and changing the value of the size
expression would not affect the result of the operator, it
is unspecified whether or not the size expression is
evaluated.
and initializer lists in 6.7.8:
+ [#23] The expressions in an initializer list are collateral
+ expressions of each other, and therefore the
order in which any side effects occur among the
initialization list expressions is unspecified.130)
Next we address the issue of making function calls atomic. This is done
by amending 6.5.2.2#10:
* [#10] The order of evaluation of the function designator
* and the actual arguments is unspecified, but there are sequence
* points before and after the actual call.
+ All side-effects within the function shall take place between
+ these two sequence calls. All side-effects in collateral
+ expressions of the function call shall either be complete before
+ the function is called or shall not take place until after the
+ function returns.
As well as the additional text, there are two changes to the existing
words: the reference to subexpressions of the arguments is dropped
(since it has no effect) and a sequence point is added after the call
(making it clear that it is "before" any subsequent subexpression).
We can then add some further examples:
+ [#13] EXAMPLE Given:
+
+ static int x;
+ int get_x (void) { return x; }
+ int put_x (int y) { x = y; return 0; }
+
+ then each of the following are legitimate, though it is
+ unspecified which of two values results:
+
+ x = 4; int z = x + put_x(8); // z could be 4 or 8
+ x = 4; int z = x++ + get_x(); // z could be 8 or 9
+
+ Similarly, in:
+
+ x = 4; int z = x++ + put_x(8);
+
+ there are three possibilities:
+ * increment stored before call: x becomes 8, z becomes 4
+ * increment stored after call: x becomes 5, z becomes 4
+ * increment all done after call: x becomes 9, z becomes 8
+
+ Note that the following expressions, with apparently similar
+ intent, all involve undefined behaviour:
+
+ z = x + (x = 8, 0);
+ z = x++ + x;
+ z = x++ + (x = 8, 0);
The above change also allows us to drop a paragraph in 7.1.4:
- [#3] There is a sequence point immediately before a library
- function returns.
since this sequence point was only put there to ensure there is one
at the end of any function call. We might also consider extended a
related footnote:
156 Such macros might not contain the sequence points that
the corresponding function calls do.
+ Thus sin(x)+sin(y) might involve undefined behaviour even though
+ (sin)(x)+(sin)(y) does not.
Finally, the last part of 6.5#2 (6.5#3 after the above changes) still
has the problem described earlier - a strict reading of the wording
"shall be read only to determine the value to be stored" would prevent
the value being used for any other purpose as well. The change:
[...] one of the accesses modifies the value [...]
else the other access shall be a read
* used to determine the value that is stored.70)
would help here, though it still isn't clear what that means. However,
I believe (though cannot at this stage prove) that saying:
[...] one of the accesses modifies the value [...]
else the other access shall be a read
* which is part of the evaluation of an operand of the ++, --,
* or assignment operator which makes the modification.70)
would, when combined with the wording from above:
then the expressions shall not be collateral expressions
be sufficient. In other words, the following wording for 6.5#3 would
establish all the essential features of the Consensus Model:
[#3] If an object has its stored value (or two overlapping
objects have the overlapping part of their stored values)
accessed by two expressions, and one of the accesses modifies
the value, then the expressions shall not be collateral
expressions (but one may be within a function called by a
collateral expression of the other; see 6.5.2.2). Furthermore,
either the accesses shall be separated by a sequence point or
else the other access shall be a read which is part of the
evaluation of an operand of the ++, --, or assignment operator
which makes the modification.