JTC1/SC22/WG14
N823
WG14/N823 C9X Public Comment WG14/N823
==================
Sponsoring National Body: J11 Date: 98/05/15
Author: Tom MacDonald (with help from Hugh Redelmeier)
Author Affiliation: Silicon Graphics Inc.
Postal Address: 655F Lone Oak Drive, Eagan, MN 55409 USA
E-mail Address: tam@cray.com
Telephone Number: +1 612 6835818
Fax Number: +1 612 6835307
Number of individual comments: 2
Below is a copy of something Hugh Redelmeier sent to the committee
over a year ago. I don't think WG14 ever adequately addressed the
issue. I'm re-submitting the paper for the June 1998 meeting.
I've made a few tweaks, but tried to clearly identify them.
Tom MacDonald
tam@cray.com
================================================================
From: hugh@mimosa.com ("D. Hugh Redelmeier")
Date: Sat, 1 Feb 1997 04:45:42 -0500
To: sc22wg14@dkuug.dk
Subject: (SC22WG14.3377) DR166 -- lvalue constraints
I promised to write a paper on DR166. I'm sorry for the lateness of
this. I have shown an earlier version to larry.jones@sdrc.com,
seebs@solon.com and gwyn@arl.mil. I have made some changes to address
their comments. I wish to thank them for their help. That does not
mean that they would approve of what I say here.
As I see it, the problem is with the wording of 6.2.2.1, in
particular, the first sentence [from c9x-std.txt on the ftp site]:
[#1] An lvalue is an expression (with an object type or an
incomplete type other than void) that designates an
object.38
This looks as if the syntactic recognition of an lvalue depends on it
really designating an object. In particular, the DR suggests that
this makes the run-time behavior of the lvalue expression affect a
constraint (a compile-time notion).
There is a classic bug in English: the substitution of "that" for
"which" and vice versa. From Fowler's Modern English Usage (alas, not
the brand new edition):
Which, that, who:
... (A) of "which" and "that", "which" is appropriate to
non-defining and "that" to defining clauses. ...
...(A) "The river, which here is tidal, is dangerous", but
"The river that flows through London is the Thames."
I think that the simple fix is to change the first sentence of
6.2.2.1:
An _lvalue_ is the form of expression used to designate an
object.#38 It shall have an object type or an incomplete type
other than void.
I think that this clearly shows the purpose of an lvalue, without
making the syntactic property depend on the runtime validity.
I have moved the parenthetical remark to its own sentence to simplify
and clarify the prose. I wonder if it belongs in a constraint
section.
Doug Gwyn suggested that expressing the intent is wimpy:
"There is no force in the "intent" that it be used to designate an
object, except when it doesn't quite, so why bother to mention it?"
He suggests:
An _lvalue_ is an expression; it shall have an object type
or an incomplete type other than void.
I see his point, but I think that describing the purpose is useful.
I agree that the wording could be better.
It is important that any runtime restrictions be explicitly stated
somewhere. I don't think this change redistributes that burden. If
they are missing now, they already were (unless the "that designates an
object" did the job).
To express the runtime restrictions, we should add something like:
When an lvalue expression is evaluated, the behavior is undefined
if the expression does not designate an object.
or
When an lvalue expression is evaluated, it shall designate an
object.
It would probably be useful to add a footnote to the effect:
[Footnote: note that the operand of a sizeof expression is not
evaluated -- 6.3.3.4]
Larry asked:
Can anyone think of a case where we need to require an
lvalue to designate an object even though it isn't evaluated?
I think not, but the committee should consider this.
================================================================
Note: the following is a separable issue. I have not prepared
suggested wording changes, so this cannot be considered as a proposal.
I am including it in case the committee is interested.
Many people have been surprised that the behavior of &a[upper_bound]
is undefined in C89. It was and is a common idiom. I still use it in
my code and haven't used an implementation that did something
unexpected.
Several comments expressed ambivalence about this. I think that they
would like to support &a[upper_bound], but don't really like *(a + upper_bound)
which is pretty hard to separate.
[[...TMacD... I suspect the `*' is a typo - should be just (a + upper_bound)
or &(*(a + upper_bound)) ...]]
If we wish to make this form well-defined in C9x, I think we could do
so here, and in the description of unary *, and in the description of
addition involving pointers.
We would need to refine the runtime restrictions that we just added to
6.2.2.1, replacing them with:
When an lvalue expression that is not the operand of a unary & is
evaluated, it shall designate an object.
When lvalue expression that is the operand of a unary & is
evaluated, it shall designate an object or one past the last
^
element [[...TMacD...]]
element of an array object.
[Perhaps this should be reworded without "shall"; the flavor should
be clear.]
We need to make some changes in 6.3.3.2 (Address and indirection operators).
Here is one paragraph from the current 6.3.3.2 that would need changing:
[#4] The unary * operator denotes indirection. If the
operand points to a function, the result is a function
designator; if it points to an object, the result is an
lvalue designating the object. If the operand has type
``pointer to type,'' the result has type ``type.'' If an
invalid value has been assigned to the pointer, the behavior
of the unary * operator is undefined.49
Here is a paragraph from the current 6.3.6 (Additive operators) that
would need to be adjusted (near the end).
[#8] When an expression that has integral type is added to
or subtracted from a pointer, the result has the type of the
pointer operand. If the pointer operand points to an
element of an array object, and the array is large enough,
the result points to an element offset from the original
element such that the difference of the subscripts of the
resulting and original array elements equals the integral
expression. In other words, if the expression P points to
the i-th element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n)
point to, respectively, the i+n-th and i- n-th elements of
the array object, provided they exist. Moreover, if the
expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the
last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the
pointer operand and the result point to elements of the same
array object, or one past the last element of the array
object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined. Unless both the
pointer operand and the result point to elements of the same
array object, or the pointer operand points one past the
last element of an array object and the result points to an
element of the same array object, the behavior is undefined
if the result is used as an operand of the unary * operator.
This paragraph seems very fragile. In fact, I'm not sure that it
works. For our purpose, I think that the only change would be
to delete the last sentence. Its function should be achieved by
appropriate words in 6.3.3.2.
Hugh Redelmeier
hugh@mimosa.com voice: +1 416 482-8253
=================== TMacD's proposed rewrite of 6.3.3.2 ====================
6.3.3.2 Address and indirection operators
Constraints
[#1] The operand of the unary & operator shall be either a
function designator, the result of a [] or unary * operator,
or an lvalue that designates an object that is not a bit-
field and is not declared with the register storage-class
^
, or one element past the last element of an array,
specifier.
[#2] The operand of the unary * operator shall have pointer
type.
Semantics
[#3] The result of the unary & (address-of) operator is a
pointer to the object or function designated by its operand.
^^^^^^^^^^
an object, or one element past the last element of an array,
If the operand has type ``type'', the result has type
``pointer to type''. If the operand is the result of a
unary * operator, neither that operator nor the & operator
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Neither operator
are evaluated, and the result shall be as if both were
^^^
is
omitted, even if the intermediate object does not exist,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
resulting pointer does not point to
an object with an effective type
(described in 6.3) that can be accessed
through this pointer.
except that the constraints on the operators still apply and
^^^^^^^^^^^
However,
the result is not an lvalue. Similarly, if the operand is
the result of a [] operator, neither the & operator nor the
unary * that is implied by the [] are evaluated, and the
result shall be as if the & operator was removed and the []
operator was changed to a + operator.
[#4] The unary * operator denotes indirection. If the
operand points to a function, the result is a function
designator; if it points to an object, the result is an
lvalue designating the object. If the operand has type
``pointer to type'', the result has type ``type''. If an
invalid value has been assigned to the pointer, the behavior
of the unary * operator is undefined.71
[[... TMacD ...]] Although, Hugh suggests a rewrite of para 4 above,
I think the current wording works. The last sentence
could be rewritten as:
If the pointer does not point to an object, the
behavior is undefined.
I also don't think these words handle the following
&p.a &p->a
assuming "a" is a member of a union and "p" points
one element past the end of an array. Not sure if
this is the intent.