JTC1/SC22/WG14
N847
SC22/WG14 N847 1998-09-04
Issues with CD2
Clive Feather
clive@demon.net
The following 43 items represent issues with CD2. Regrettably, many of them
were also issues with CD1 and do not seem to have been addressed; where
possible the item has been rewritten to explain the problem better.
Items 1 to 42 are in order of location within CD2.
[Item 01, based on PC-UK0021]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 4
Title: Further requirements on the conformance documentation
Detailed description:
The Standard requires an implementation to be accompanied by documentation
of various items. However, there is a subtle difference between the terms
"implementation-defined" and "described by the implementation" which has
been missed by this wording (this is partly due to the tightening up of the
uses of this term between C89 and C9X - see for example subclause 6.10.6).
As a result, the wording does not actually require the latter items to be
documented.
Change the paragraph to:
An implementation shall be accompanied by a document that describes
all features that this International Standard requires to be described
by the implementation, including all implementation-defined
characteristics and all extensions.
========
[Item 02, based on PC-UK0001]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 5.1.1.2
Title: Error in applying working paper N673
Detailed description:
When N673 was applied to the draft, a new footnote was erroneously left
out. The following footnote should be included, with a reference at the end
of translation phase 2:
[*] Thus the physical source lines (delimited by | characters):
|\\\|
||
|n|
generate the logical source lines:
|\\|
|n|
and a source file may end with a backslash followed by two
physical newlines, which will generate a last logical source
line ending in a backslash.
========
[Item 03]
Category: Feature that should be included
Committee Draft subsection: 5.1.1.2, 5.2.2, 6.4.4.4
Title: provide a \s character
Detailed description:
Translation phase 5 states that if the execution character set cannot
represent a character in the source set, it is converted to "an
implementation-defined member" (of the execution character ser). It would
be useful to have access to this character in an consistent manner, and the
escape sequence \s ("substitute") is proposed for this purpose.
Change translation phase 5 (5.1.1.2p1) to end:
if there is no corresponding member, it is converted to the
character represented by \s.
In 5.2.2p2, add a entry to the list:
\s (substitute) Produces a visible indication that a source character
was used that does not correspond to a member of the execution
character set. The active position is advanced as for a graphic
character.
In 6.4.4.4, add \s to "simple-escape-sequence" in p1 and to the list in p8.
========
[Item 04, based on PC-UK0027]
Category: Inconsistency
Committee Draft subsection: 5.2.1 plus scattered other changes
Title: inconsistencies in use of "basic" and "extended" character sets
Detailed description:
[Please note: this is *not* a UCN issue.]
The Standard uses the terms "basic character set" and "extended character
set" at various places. However, the exact meaning of these two is not
clear, and this leads to confusion.
Consider the UTF-8 encoding (codes from 0 to 127 are single byte, codes
from 128 to 255 form part of multibyte characters with length from 2 to 5
bytes). At execution time here are five "interesting" character sets based
on this encoding:
[1] The 95 characters required by 5.2.1p3, plus the null character.
[2] The 128 single byte characters.
[3] The 2**31 multibyte characters.
[4] Set [3] minus set [1].
[5] Set [3] minus set [2].
(and of course the corresponding source sets).
It is unclear whether the "basic character set" means [1] or [2] or something
else, and as a result the Standard has to use circumlocutions such as "the
required characters". Looking at the various places where the term is used
has led me to believe that it is most useful to have terms for [1] and
for [4], while there is little or no need to refer to any of the others.
Therefore it would be logical for "basic character set" to represent [1] (the
set of characters required in all implementations) and "extended character
set" to represent [4] (any other characters provided by the specific
implementation).
This requires the following changes:
Replace 5.2.1p1, second sentence, by:
Each set is further divided into a /basic character set/, whose contents
are given by this subclause, and an /extended character set/, consisting
of zero or more locale-specific members (which are not members of the
basic character set).
[Note that this defines the two terms.]
In 5.2.1p3, delete "at least" in the first sentence, and in the fourth
sentence change "In the execution character set" to "In the basic
execution character set".
Replace 5.2.1.2p1, first bullet, by:
- The basic character set shall be present and shall be encoded
using single-byte characters.
In 6.2.5p3, replace "required source character set enumerated in 5.1.2"
with "basic execution character set". (Note that the execution set is more
sensible in this context than the source set.)
In 6.4.2.1p3, change "that are not part of the required source character
set" to "that are in the extended source character set".
In 6.4.3p2 and p3, change "required" to "basic".
In 6.4.4.4p8 change "required" to "basic".
Change 7.1.1p2 to:
A /letter/ is one of the 52 lowercase and uppercase letters in the
basic extension character set. All letters are printing characters.
In Annex Ip2, delete "required".
In Annex K.2p1, third bullet, change "required" to "basic".
In Annex K.2p1, fifth bullet, delete "required".
In Annex K.3.4p1, fourth bullet, change "required" to "basic".
Change K.4p1, first bullet, to:
- Any members of the extended execution character set (5.2.1).
Change K.4p1, second bullet, to:
- The presence, meaning, and representation of any multibyte
characters in the extended execution character set (5.2.1.2).
In Annex K.5.2p1, change "required" to "basic".
========
[Item 05, based on PC-UK0015]
Category: Feature that should be included
Committee Draft subsection: 5.2.4.2.1
Title: ensure int can hold all characters
Detailed description:
A number of functions in the standard library (particularly in <ctype.h> and
<stdio.h> assume that an int is capable of holding every possible unsigned
char value. If this is not the case then these functions will, to say the
least, behave in a peculiar manner. While it is arguable to say that this
implies that int must be able to hold every unsigned char value, it would
be better to make this an explicit requirement on the implementation.
To do so, append to 5.2.4.2.1p2:
On a hosted implementation, INT_MAX shall be not less than UCHAR_MAX.
Note that this does *not* forbid char and int from being the same type.
========
[Item 06, based on PC-UK0047]
Category: Request for information/clarification
Committee Draft subsection: 6.10
Title: Parsing ambiguity in preprocessing directives
Detailed description:
Consider parsing the following text during the preprocessing phase
(translation phase 4):
# if 0
xxxx
# else
yyyy
# endif
The third line fits the syntax for the first option of group-part, and thus
generates two possible parsings. One of these will cause both text lines to
be skipped, while the other only causes the second to be skipped.
It is easy to fix this ambiguity. In the syntax in 6.10p1, change group-part
to:
group-part:
non-directive new-line
if-section
control-line
and add:
non-directive:
pp-tokens/opt
Then add a new paragraph to the Constraints, after 6.10p3:
The first preprocessing-token (if any) in a non-directive shall not
be /#/.
Finally, delete 6.10.3p8, because this can no longer occur.
Note that this change has the added benefit of making it clear that unknown
preprocessing directives require a diagnostic and do not affect conditional
inclusion.
========
[Item 07, based on PC-UK0049]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.10.1
Title: Handling of UCNs in character constants in #if directives
Detailed description:
Consider the line:
#if '\u0024' < 100
where dollar is in the single-byte execution character set. It is not
completely clear from 6.10.1p3 that the UCN is converted to a single
character, since this normally happens in translation phase 5 and it is
not specifically mentioned here.
In 6.10.1p3 (near the end), change:
... which may involve converting escape sequences into execution
character set members.
to:
... which may involve converting escape sequences and universal
character names into execution character set members in the manner
of translation phase 5.
========
[Item 08, based on PC-UK0071]
Category: Inconsistency
Committee Draft subsection: 6.10.2
Title: Clarify included file process
Detailed description:
6.10.2p3 ends:
If this search is not supported, or if the search fails, the directive
is reprocessed as if it read
#include <h-char-sequence> new-line
with the identical contained sequence (including > characters, if any)
from the original directive.
The wording is technically incorrect, precisely because the original
directive could contain angle brackets within the quotes whereas an
h-char-sequence cannot. Better wording would be:
If this search is not supported, or if the search fails, the directive
is reprocessed as if it read
#include <h-char-sequence> new-line
with the identical contained sequence from the original directive
(if the q-char-sequence contains a > character, this is retained in
the name searched for even though it could not appear in a true
h-char-sequence).
========
[Item 09, based on PC-UK0052]
Category: Feature that should be included
Committee Draft subsection: 6.10.3
Title: Add a __VA_COUNT__ facility for varargs macros
Detailed description:
Unlike with function calls, it is trivial for an implementation to
determine the number of arguments that match the ... in a varargs macro.
There are a number of useful things that can be done with this (at the
least, providing argument counts to varargs functions). Therefore this
information should be made available to the macro expansion.
In 6.10.3p5, change
The identifier /__VA_ARGS__/ ...
to:
The identifiers /__VA_ARGS__/ and /__VA_COUNT__/ ...
Append to 6.10.3.1p2:
An identifier /__VA_COUNT__/ that occurs in the replacement list
shall be replaced by a single token which is the number of trailing
arguments (as a decimal constant) that were merged to form the variable
arguments.
========
[Item 10, based on PC-UK0054]
Category: Other: C++ conflict avoidance
Committee Draft subsection: 6.10.8
Title: Require that __cplusplus not be defined
Detailed description:
Add to 6.10.8 a new paragraph 5:
The implementation shall not predefine the macro /__cplusplus/,
nor shall it define this macro in any header defined in clause 7.
This change was agreed by the full committee at the Menlo Park meeting, but
seems to have been lost.
========
[Item 11, based on PC-UK0169]
Category: Feature that should be included
Committee Draft subsection: 6.10.8
Title: provide a __STDC_HOSTED__ macro
Detailed description:
There is currently no way for a program to determine if the implementation
is hosted or freestanding. A standard predefined macro should be provided.
Add to the list in 6.10.8p1:
__STDC_HOSTED__ The decimal constant 0 if the implementation is a
freestanding one and the decimal constant 1 if it is
a hosted one.
========
[Item 12, based on PC-UK0024]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.2.5
Title: Replace footnote 25
Detailed description:
Footnote 25 is unclear in the context in which it appears (implementation-
defined types). The wording of footnote 29 explains what is meant much more
clearly, and can be applied to both situations.
Replace the text of footnote 25 with that of footnote 29, and change all
references to the latter to be references to the former.
========
[Item 13, based on PC-UK0050]
Category: Inconsistency
Committee Draft subsection: 6.2.6.1, 6.5.2.3
Title: Effects on other members of assigning to a union member
Detailed description:
6.5.2.3p5 has wording concerning the storing of values into a union member:
With one exception, if the value of a member of a union object is
used when the most recent store to the object was to a different
member, the behavior is implementation-defined.
When this wording was written, "implementation-defined" was interpreted
more loosely and there was no other relevant wording concerning the
representation of values. Neither of these is the case anymore.
The requirement to be implementation-defined means that an implementation
must ensure that all stored values are valid in the types of all the other
members, and eliminates the possibility of them being trap representations.
It also makes it practically impossible to have trap representations at all.
This is not the intention of other parts of the Standard.
It turns out that the wording of 6.2.6.1 is sufficient to explain the
behavior in these circumstances, and the cited wording in 6.5.2.3 merely
muddles the issue. It should be removed; the rest of the paragraph can
stand alone.
========
[Item 14]
Category: Improved terminology (technically normative)
Committee Draft subsection: 6.3.4, plus scattered other changes
Title: better terminology for object lifetimes
Detailed description:
The term "lifetime" is used at a few places in the Standard but never
defined. Meanwhile a number of places uses circumlocutions such as "while
storage is guaranteed to be reserved". These would be much easier to read
if the term "lifetime" was defined and used.
Make the following changes to subclause 6.3.4.
Delete paragraph 5 and insert a new paragraph between 1 and 2:
The /lifetime/ of an object is the portion of program execution
during which storage is guaranteed to be reserved for that object.
An object exists and retains its last-stored value throughout its
lifetime. Objects with static or automatic storage duration have a
constant address throughout their lifetime.23 If an object is referred
to outside its lifetime, the behavior is undefined. The value of a
pointer is indeterminate after the end of the lifetime of the object
it points to.
Change paragraphs 2 to 4 (which will become 3 to 5) to:
[#2] An object whose identifier is declared with external or
internal linkage, or with the storage-class specifier static,
has static storage duration. The lifetime of the object is the
entire execution of the program. Its stored value is initialized
only once.
[#3] An object whose identifier is declared with no linkage
and without the storage-class specifier static has automatic
storage duration. For objects that do not have a variable
length array type, the lifetime extends from entry into the
block with which it is associated until execution of the block
ends in any way. (Entering an enclosed block or calling a function
suspends, but does not end, execution of the current block.)
If the block is entered recursively a new object is created each
time. The initial value of the object is indeterminate; if an
initialization is specified for the object, it is performed each
time the declaration is reached in the execution of the block;
otherwise, the value becomes indeterminate each time the
declaration is reached.
[#4] For objects that do have a variable length array type, the
lifetime extends from the declaration of the object until execution
of the program leaves the scope of that declaration24. If the scope
is entered recursively a new object is created each time. The initial
value of the object is indeterminate.
Other changes:
In 5.1.2p1 change "in static storage" to "with static storage duration".
Change footnote 9 to:
9) In accordance with 6.2.4, a call to exit will remain within the
lifetime of objects with automatic storage duration declared in main
but a return from main will end their lifetime.
Delete 5.1.2.3p5 as it just duplicates material in 6.2.4p3-4.
Change the last portion of 6.5.2.5p17 to:
of the loop only, and on entry next time around p would be
pointing to an object outside of its lifetime, which would
result in undefined behavior.
Change the last portion of footnote 72 to:
and the address of an automatic storage duration object after the
end of its lifetime.
Change the first sentence of 6.7.3.1p5 to:
Here an execution of B means the lifetime of a notional object with
type /char/ and automatic storage duration associated with B.
Add to 7.20.3 a second paragraph:
The lifetime of an object allocated by the calloc, malloc, or
realloc functions extends from the function call until the
object is freed by the free or realloc functions. The object has a
constant address throughout its lifetime except when moved by a call
to the realloc function.
The last sentence of 7.20.3p1 is redundant and could be deleted.
Relevant bullet points in annex K should also be changed.
========
[Item 15]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4.3
Title: reword the list of forbidden UCNs
Detailed description:
Change 6.4.3p2 to read:
A universal-character-name shall not specify (in either form) a
character short identifier less than 000000A0 other than:
00000024 00000040 00000060
or in the range 0000D800 to 0000DFFF inclusive.
This wording makes it easier to understand the restriction, because it is
not necessary to cross-reference the list in 5.2.1 and then determine the
UCNs of those characters.
========
[Item 16, based on PC-UK0026]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4.5
Title: improve the example of character string literals
Detailed description:
Append to 6.4.5p7, the example:
When this is used to initialize a static array, the array has three
members that are initialized to /18/, the value of /'3'/, and /0/
respectively.
========
[Item 17, based on PC-UK0036]
Category: Normative change where the intent is unclear
Committee Draft subsection: 6.5.16
Title: Define the result of the assignment operator
Detailed description:
6.5.16p3 states:
An assignment expression has the value of the left operand after the
assignment, but is not an lvalue.
Two interpretations have been put on this wording:
* the value of the assignment expression is the value that will also be
stored in the left operand ("same-value" semantics);
* the value of the assignment expression is the result of reading the left
operand after storing the value in it ("write-then-read" semantics).
These two have different results when the left operand is a volatile object
that can be changed by external causes (such as a clock or a memory-mapped
device register). This ambiguity needs to be resolved.
Consider the code:
int x;
extern volatile int system_timer; // precision of 1 microsecond
extern volatile int serial_port; // writing sends a word, reading
// returns the next word received
// ...
x = system_timer = 42; // statement 1
serial_port = 66; // statement 2
With same-value semantics, statement 1 will set x to 42 and will send the
value 66 to the serial port. With write-then-read semantics, statement 1
will set x to some other value (the change in the timer between writing to
it and reading it back).
More important, though, is the effects of statement 2 in write-then-read
semantics. Because a statement expression is evaluated for its side effects,
it is reasonable to require the value of the assignment statement to be
determined before being thrown away (in particular, there is *no* statement
in the Standard as to when the value of the assignment expression is or is
not evaluated). This means that statement 2 always has the side effect of
reading a word from the serial port, and there is no way to write without
reading.
Assuming that same-value semantics are intended, replace the cited words
by:
The value of the assignment expression is the value stored into the
left operand, but is not an lvalue.
If write-then-read semantics are intended, replace the cited words by
something along the lines of:
The value of the assignment expression is the result of reading the
left operand after the value has been stored in it [*], but is not an
lvalue. If the value of the assignment expression is not used as an
operand in another expression,it is unspecified whether or not the
left operand is actually read.
[*] Thus if the left operand has volatile-qualified type and can be
changed by external means, the value of the expression might not be
the same as the value stored.
[I do *not* claim that these later words actually have the desired effect.]
========
[Item 18, based on PC-UK0033]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.5.2.2
Title: Fix wording relating to "number of arguments"
Detailed description:
6.5.2.2p2 states "the number of arguments shall agree with the number of
parameters". This does not clearly take account of varargs functions.
Similarly the second sentence does not allow for the trailing arguments of
varargs functions.
Change the paragraph to:
If the expression that denotes the called function has a type that
includes a prototype, the number of arguments shall agree with the
number of parameters (that is, if the prototype contains an ellipsis
there shall be at least as many arguments as parameters, otherwise
there shall be the same number of arguments as parameters). Each
argument that corresponds to a declared parameter shall have a type
such that its value may be assigned to an object with the unqualified
version of the type of its corresponding parameter.
========
[Item 19, based on PC-UK0003]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.5.2.2, 7.15.1.1
Title: Adjustment to permitted incompatible argument types
Detailed description:
At the Menlo Park meeting we agreed to amend 6.5.2.2p6 to permit
incompatible parameter and argument types in certain cases where the
representation is required to be the same. The cases permitted form the two
bullet points at the end of the paragraph. However, the second case was
intended to be slightly wider than the wording appearing in the draft; it
should have been:
- both types are pointers to qualified or unqualified versions of /void/
or of character types.
(in other words, it should be possible to pass unsigned char * values to
parameters of type char * as well as of void *).
The same change needs to be made in 7.15.1.1p2.
========
[Item 20]
Category: Normative change to existing feature retaining the original intent
Committee Draft subsection: 6.5.3.4
Title: Forbid sizeof bit-fields when not lvalues
Detailed description:
Consider the expression:
sizeof func().bit_field
This is currently not forbidden by 6.5.3.4p1 because it is not an lvalue.
This is clearly an oversight.
Change 6.5.3.4p1 to:
The sizeof operator shall not be applied to an expression that has
function type, an incomplete type, a bit-field type, or to the
parenthesized name of any such type.
========
[Item 21, based on PC-UK0017]
Category: Editorial
Committee Draft subsection: 6.5.9
Title: tidy up changes to pointer comparison
Detailed description:
Though the wording is mostly correct, 6.5.9p6 does not complete cover every
case and does not make it clear that it is exhaustive.
Append to 6.5.9p6:
Otherwise they shall compare unequal.
Append to footnote 80:
Two different subobjects of an object are not "the same object" and
pointers to them compare unequal.
========
[Item 22, based on PC-UK0040]
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.2.1
Title: Bitfields of unsupported types should require a diagnostic.
Detailed description:
If a bitfield is declared with a type other than /_Bool/ or plain, signed,
or unsigned int, the behavior is undefined. Since this can easily be
determined at compile time, a diagnostic should be required. It is
reasonable to exempt other integer types that the implementation knows how
to handle.
Add to the end of 6.7.2.1p3:
A bit-field shall have a type that is a qualified or unqualified
version of /_Bool/, /signed int/ or /unsigned int/, or of some other
implementation-defined integer type.
Delete the first sentence of 6.7.2.1p8.
Note that this wording allows additional implementation-defined bitfield
types so long as they are integers. If they are not, the behaviour would
not be defined by the Standard and so a diagnostic should still be required.
An implementer can also allow non-integer bitfield types, but a diagnostic
is still required.
========
[Item 23, based on PC-UK0007]
Category: Other: outstanding problem
Committee Draft subsection: 6.7.3.1
Title: Problem with restrict and string literals
Detailed description:
Consider any function which takes two char * parameters where one of them
is restrict-qualified and a call where the corresponding arguments are
both string literals. For example:
char *s = "test string\n";
printf ("This - %s - is the test string\n", s);
Because of the restrict qualification, it is not permitted for the two
strings to share storage. However, an implementation is entitled to let the
literals do so, quite possibly without the programmer realizing that the
situation happened (for example, the first parameter might be a macro
defined in a makefile).
A similar situation occurs when compound literals share storage; in this
case the parameters might have almost any restrict-qualified type.
One solution would be to exempt unmodifiable objects from the requirements
of restrict. Another would be to adopt alternative semantics for restrict
as proposed by the UK (that the object pointed to by a restrict-qualified
pointer is either not altered or is only accessed via that pointer).
========
[Item 24, based on PC-UK0042]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.7.4
Title: Clarify some aspects of inline
Detailed description:
A good inlining implementation can inline calls to the comparison function
of qsort and other indirect calls. It should be clearer that this is
permitted.
In 6.7.4p6, add a footnote referenced at the end of the last but one
sentence ("An inline definition provides ... the same translation unit"):
[*] The call need not be due to the direct appearance of the name of
the function at the point of calling; it may be through some kind of
indirection.
========
[Item 25, based on PC-UK0042]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.7.4
Title: Clarify some aspects of inline
Detailed description:
The exact relationship between the inline and extern keywords is not
obvious, particularly when the extern declaration of an inline function
occurs after its definition. For this reason it should be made clearer in
the examples. In 6.7.4p8, after:
because /fahr/ is also declared with /extern/
add:
(even though that declaration is not visible at the definition of
/fahr/)
========
[Item 26, based on PC-UK0167]
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.5.2
Title: require side effects in VLA declarations to work normally
Detailed description:
6.7.5.2p3 states in part:
It is unspecified whether side effects are produced when the size
expression is evaluated.
This rule will be extremely confusing to the normal programmer. It places
a unreasonable burden on anyone who needs to write code with side-effects
(particularly if the size is determined via a function call), and it does
not offer any significant benefit to the implementation; to see this,
consider that, however the implementation handles:
int vla [n++][func()];
it must correctly handle the equivalent code:
int vla_size [2] = { n++, func () };
int vla [vla_size [0]][vla_size [1]];
Other issues, such as the order of side effects, can be ignored here and
handled in the same way as elsewhere in the Standard. See the WG14
archives for a fuller discussion of the topic.
Change required: delete this sentence.
========
[Item 27, based on PC-UK0046]
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.7.7
Title: Correct ranges of bitfields in an example
Detailed description:
In 6.7.7p6, example 3, describes the ranges of various bit-fields in terms
of "at least the range". This is because C89 was not clear on what the
permitted ranges of integer types was. These ranges are now tightly
specified by 6.2.6.2, and so the wording of this example should be altered
accordingly:
- change "at least the range [-15, +15]"
to "either the range [-15, +15] or the range [-16, 15]"
- change "values in the range [0, 31] or values in at least the range
[-15, +15]"
to "values in one of the ranges [0, 31], [-15, +15], or [-16, +15]"
========
[Item 28, based on PC-UK0014]
Category: Inconsistency
Committee Draft subsection: 6.7.8
Title: problems with initializing unsigned char arrays.
Detailed description:
Consider the following declaration:
unsigned char s [] = "\x80\xff";
The first element of the string literal has the value:
(char) 128
and the second element has the value:
(char) 255
If the type char is signed and CHAR_MAX is less than 128, these two
expressions are implementation-defined. In particular, on a ones-
complement implementation likely values are -127 and -0 respectively.
When these are converted back to unsigned char during the initialization,
then (if UCHAR_MAX is 255) they will be converted to 129 and 0 respectively.
This is *not* intuitive. Furthermore, while I do not have access to an
implementation with ones-complement arithmetic, I suspect that they apply
"copy bytes" semantics for initialization, rather than the "double cast"
semantics that the strict wording requires.
The following changes provide the more intuitive semantics:
Append to 6.7.8p14:
The value of each element is determined by converting the corresponding
numerical representation of the mapped character, or the octal or
hexadecimal escape sequence, directly to the array element type,
not via the type char.
Append to example 8 in 6.7.8p32:
The declaration:
unsigned char c [] = "\xFF";
is identical to:
unsigned char c [2] = { 0xFF, 0 };
and not to:
unsigned char c [2] = { (unsigned char)(char) 0xFF, 0 };
(the latter could be different if /CHAR_MAX/ is less than 255 and
the implementation-defined value of the expression /(char) 0xFF/
is not equal to /254-UCHAR_MAX/).
========
[Item 29, based on PC-UK0072]
Category: Feature that should be included
Committee Draft subsection: 7.14.1.1, 7.20.4
Title: _exit function
Detailed description:
As part of a working paper (N789), I suggested that C provide an _exit()
function like that in POSIX, and signal handlers should be allowed to
call this function. The Menlo Park meeting agreed to add this function
unless an unresolvable technical issue was found that would make it not
conformant to POSIX - no such issue has been raised. Since the meeting I
have made some minor improvements to the wording.
In 7.14.1.1p5, change:
or the signal handler calls any function in the standard library
other than the /abort/ function or the /signal/ function
to:
or the signal handler calls any function in the standard library
other than the /abort/ function, the /_exit/ function, or the
/signal/ function
Add a new subclause within 7.20.4:
7.20.4.X The _exit function
Synopsis
#include <stdlib.h>
void _exit (int status);
Description
The /_exit/ function causes normal program termination to occur,
and control to be returned to the host environment. No functions
registered by the /atexit/ function or signal handlers registered by
the /signal/ function are called. The /_exit/ function never returns
to the caller. The status returned to the implementation is determined
in the same manner as for the /exit/ function. It is implementation-
defined whether open output streams are flushed, open streams closed,
or temporary files removed.
========
[Item 30, based on PC-UK0056]
Category: Feature that should be included
Committee Draft subsection: 7.17
Title: Add a symbol giving the maximum alignment
Detailed description:
When writing functions that use the results of malloc et.al. in a general
way (such as malloc wrappers) it is necessary to know what the worst
possible alignment is. This value is known to the implementation in order
to provide malloc in the first place, but cannot be derived by an
application program. Thus it is eminently suitable for standardisation.
Typical use might be:
struct mallocinfo { char *file; unsigned line; time_t time };
#define HDRSIZE (((sizeof (struct mallocinfo) - 1) / _ALIGNMENT_ALL \
+ 1) * _ALIGNMENT_ALL)
void *my_malloc (size_t n, char *file, unsigned line)
{
unsigned char *p = malloc (n + HDRSIZE);
if (p == NULL) return p;
struct mallocinfo *h = (struct mallocinfo *) p;
h->file = file;
h->line = line;
h->time = localtime();
return p + HDRSIZE;
}
void my_free (void *p)
{
if (p != NULL)
free ((unsigned char *) p - HDRSIZE);
}
[I eventually decided <stddef.h> is the right place for this.]
Add a new macro to <stddef.h>:
_ALIGNMENT_ALL
which expands to an integer constant expression that has type /size_t/,
the value of which is the least common multiple of the alignments of
all object types.[*]
[*] If /p/ has pointer to character type and is suitably aligned for
some type /t/, then /(p + _ALIGNMENT_ALL)/ is also suitably aligned
for the same type /t/, no matter what /t/ is.
========
[Item 31, based on PC-UK0169]
Category: Feature that should be included
Committee Draft subsection: 7.17
Title: relax restrictions on the offsetof macro
Detailed description:
The offsetof macro currently requires its first argument to be a structure
type, and is unclear what the second argument is. There is no particular
reason to forbid unions for the first argument, nor to forbid complex
constructs for the second argument, provided only that the address constant
requirement continues to hold.
In 7.17p3, change "structure" to "structure or union" in two places,
and change:
The /member-designator/ shall be such that given
to:
The /member-designator/ may be any construct, provided that given
and add a footnote to the end of the paragraph:
[*] Thus the member-designator may be a construct like /m [2]/ or
/a.b.c/. The offset of any member of a union is 0.
========
[Item 32, based on PC-UK0057]
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.19.2, 7.24.3.5, 7.24.6
Title: Better locale handling for wide oriented streams
Detailed description:
7.19.2p6 associates an /mbstate_t/ object with each stream, and 7.19.3p11-13
state that this is used with the various I/O functions. On the other hand,
7.24.6p3 places very strict restrictions on the use of such objects,
restrictions that cannot be met through the functions provided in the
Standard while allowing convenient use of wide formatted I/O.
Furthermore, an /mbstate_t/ object is tied to a single locale based on the
first time it is used. This means that a wide oriented stream is tied to
the locale in use the first time it is read or written. This will be
surprising to many users of the Standard.
Therefore, at the very least these objects should be exempt from the
restrictions of 7.24.6; the restrictions of 7.19 (for example, 7.19.2p5
bullet 2) are sufficient to prevent unreasonable behaviour. In addition,
the locale of the object should be tied and not affected by the current
locale. The most sensible way to do this is to use the locale in effect
when the file is opened, but allow /fwide/ to override this.
In 7.19.2p6, add after the first sentence:
This object is not subject to the restrictions on direction of use
and of locale that are given in subclause 7.24.6. All conversions
using this object shall take place as if the /LC_CTYPE/ category
setting of the current locale is the setting that was in effect when
the orientation of the stream was set with the /fwide/ function or,
if this has not been used, when the stream was opened with the
/fopen/ or /freopen/ function.
In 7.24.3.5, add a new paragraph after paragraph 2:
If the stream is successfully made wide oriented, the /LC_CTYPE/
category that is used with the /mbstate_t/ object associated with
the stream shall be set to that of the current locale.
In 7.24.6p3, append:
These restrictions do not apply to the /mbstate_t/ objects associated
with streams.
========
[Item 33, based on PC-UK0058]
Category: Request for information/clarification
Committee Draft subsection: 7.19.4.3
Title: Unclear how many times tmpfile() can be called.
Detailed description:
Nowhere does the Standard state how many times tmpfile() can be called, nor
does it state that several successful calls will actually access different
files !
Append to 7.19.4.3p2:
The file will be different from any other existing file, including any
opened by a previous successful call to the /tmpfile/ function.
Add a new part to 7.19.4.3:
Recommended practice
It should be possible to open at least /TMP_MAX/ temporary files
during the lifetime of the program, and no limit on the number
simultaneously open other than this limit and any limit on the
number of open streams (FOPEN_MAX). The limit of /TMP_MAX/ could be
shared with calls to /tmpnam/.
========
[Item 34, based on PC-UK0064]
Category: Request for information/clarification
Committee Draft subsection: 7.19.8.1, 7.19.8.2
Title: Clarify the actions of fread and fwrite
Detailed description:
The exact behaviour of fread and fwrite are not well specified, particularly
on text streams but in actuality even on binary streams. These changes
apply the obvious semantics.
In 7.19.8.1p2, add after the first sentence:
For each object, /size/ calls are made to the /fgetc/ function and
the results stored, in the order read, in an array of /unsigned char/
exactly overlaying the object.
In 7.19.8.2p2, add after the first sentence:
For each object, /size/ calls are made to the /fputc/ function, taking
the values (in order) from an array of /unsigned char/ exactly
overlaying the object.
========
[Item 35, based on PC-UK0063]
Category: Feature that should be included
Committee Draft subsection: 7.19.9
Title: Provide a way to compare fpos_t values.
Detailed description:
There is no way to determine whether two fpos_t values represent the same
position in a file. Therefore, it is not possible to do operations such as
the following:
- open a file
- move through it, looking for some mark
- note the position using fgetpos()
- rewind
- move through it again to the same position, using calls to fgetpos()
to determine where you are, rather than relying on having made exactly
the same sequence of reads and seeks
Add a new function to 7.19.9:
7.19.9.6 The fcmppos function
Synopsis
#include <stdio.h>
struct fcmppos fcmppos (fpos_t* pos1, fpos_t* pos2, FILE *stream)
Description
The /fcmppos/ function compares the values pointed to by /pos1/ and
/pos2/, which must both refer to the stream /stream/. If either of the
first two arguments is a null pointer, the result of a call to the
/fgetpos/ function on the stream is used instead. If the stream has
been written to at any point before the later of the two positions,
the behaviour is undefined.
Returns
The value returned is a structured type containing at least the
following fields:
int before; // Less than, equal to, or greater than zero according
// to whether /*pos1/ is before, at the same location
// as, or after /*pos2/ in the file.
int mbstate; // Zero if and only if the two positions have the same
// multibyte parsing status.
It will also be necessary to add /struct fcmppos/ to the start of 7.19.
========
[Item 36, based on PC-UK0061]
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.2.1.1
Title: Explicitly allow assert on non-Boolean arguments
Detailed description:
DR 107 asked questions about the assert macro (it was written when the
parameter type was given as int). Part c asked:
Must a conforming implementation convert the value yielded by the
expression given in an invocation of the assert macro to type int
before checking to see if it compares equal to zero?
and the answer given was "no". Other parts of the response stated:
Passing a non-int argument in such a context will render the
translation unit not strictly conforming.
and:
a violation of this requirement results in undefined behavior
It is clear from these, as well as from a reasoned consideration of 6.10,
that the argument to the assert macro is *not* converted to the required
type, but must already have that type. This means that expressions such
as:
assert (n > 0) // new problem in CD2 - int is not _Boolean
assert (p != NULL) // new problem in CD2 - int is not _Boolean
assert (1U) // problem in C89 - unsigned int is not int
assert (2.5) // problem in C89 - double is not int
all produce undefined behavior. The change made between CD1 and CD2 has
only exacerbated this, requiring explicit casts of comparisons:
assert ((_Bool) (n != 0))
The wording changes required to fix this are simple and do not affect the
spirit of assert. The implementation is also trivial - the definition of
the assert macro might need to have "expression" changed to either
"!!(expression)" or "(expression) != 0" where it is tested, though it is
possible that the existing definition might already be valid.
In 7.2.1.1p1, change "_Bool expression" to "scalar expression", where the
word "scalar" is in italics. Add to paragraph 2, either after the first
sentence or at the end:
The argument of the /assert/ macro may be any expression with scalar
type.
========
[Item 37, based on PC-UK0067]
Category: Other: tidy up (technically normative)
Committee Draft subsection: 7.20
Title: tidy up definitions of <stdlib.h> macros
Detailed description:
In 7.20p3, change:
EXIT_SUCCESS
which expand to integer expressions which ...
to:
EXIT_SUCCESS
which expand to integer constant expressions which ...
and change:
MB_CUR_MAX
which expands to a positive integer expression whose value
... never greater than /MB_LEN_MAX/.
to:
MB_CUR_MAX
which expands to a positive integer expression whose type is /size_t/
and whose value ... never greater than /MB_LEN_MAX/. This is not a
constant expression: it may change whenever the locale changes.
========
[Item 38, based on PC-UK0070]
Category: Feature that should be included
Committee Draft subsection: 7.22
Title: Type-generic macros should be generally useful
Detailed description:
7.9 introduces the concept of type-generic macros, but these are only
available for a small range of mathematical functions. This facility should
be made generally available so that they can be used for general
programming.
========
[Item 39]
Category: Various (some normative)
Committee Draft subsection: 7.23
Title: various changes to <time.h>
Detailed description:
The following items constitute a number of changes to 7.23 <time.h>. Some
are editorial and some are normative. They are all included in one place
for convenience, though each item stands alone.
Sub-item 1 (editorial):
In 7.23.1p5, "tm_extlen object" should read "tm_extlen member".
Sub-item 2 (editorial):
In 7.23.2.6p2 the list should be closed up and indented, in the same style
as the lists in 7.23.1p4 and p5.
Sub-item 3 (normative):
There is an error in the algorithmin 7.23.2.6p3. The first line of the
expression for D should read:
D = Y*365 + DIV(Z,400)*97 + MOD(Z,400)/4 - MOD(Z,400)/100 +
Sub-item 4 (editorial):
In footnote 252 "401 B.C.E." should read "401 B.C.".
Sub-item 5 (editorial):
In 7.23.3.5p3, item %C, delete the "(00-99)" that seems to have appeared
since CD1 (the year is not limited to 0 to 9999).
Sub-item 6 (normative):
It would be convenient to provide a way to produce the output generated in
the "C" locale even when in another locale (for example to produce mixed
format output). This could reasonably be done by a "C" modifier (there is
no case where it makes sense to have this combined with the "E" or "O"
modifiers).
In 7.23.3.5p2 add "C" to the list of modifiers, and add a new paragraph
before p4:
The C modifier indicates that the replacement text shall be that
produced in the "C" locale, irrespective of the current locale.
Sub-item 7 (normative):
The "E" and "O" modifiers could be made more general while, at the same
time, making their meanings clearer. The following wording replaces
7.23.3.5p4 completely, though in principle one changed modifier could be
adopted without the other. If this change is not made, it should be noted
that "%OW" has been incorrectly written as "%Ou".
[E modifier]
The E modifier indicates that a locale-specific alternate calendar
shall be used. All specifiers whose replacement depends on the date
shall use the alternate calendar, and the replacement text shall depend
on all the members tm_year, tm_mon, and tm_mday as well as any listed
for the specifier. If there is no such alternate system, the modifier
is ignored.
If the alternate system makes use of base years (also known as eras)
and offsets from the base, then the following specifiers have different
meanings:
%EC is replaced by the name of the base year or era.
%Ey is replaced by the offset from %EC (year only).
%EY is replaced by the locale's full alternative year
representation including both era and offset.
[O modifier]
The O modifier indicates that a locale-specific set of alternative
numeric symbols are to be used instead of decimal digits in the text
replacing the conversion specifier. If there are no alternative
numeric symbols, the modifier is ignored.
========
[Item 40, based on PC-UK0031]
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.4.1.8
Title: make ispunct() true for basic punctuation characters
Detailed description:
In C89, including after the addition of NA1, the definition of ispunct()
was:
The ispunct function tests for any printing character that is
neither space (' ') nor a character for which isalnum is true.
At sometime during the revision process this definition has been changed;
this is a Quiet Change with no obvious rationale. It also makes it
impossible to predict what will happen in the "C" locale.
Preferably this wording should be restored. Alternatively, wording should
be adopted that at least returns true for the required 29 punctuation
characters in the "C" locale. The following change uses wording analogous
to that in other <ctype.h> functions and has the benefit that it clearly
defines the results in the "C" locale without leaving them up to the
implementation.
Replace 7.4.1.8p2 by:
The /ispunct/ function tests for any character that is one of the
29 graphic characters in the basic execution character set or is
one of a locale-specific set of printing characters for which neither
/isspace/ nor /isalnum/ is true. In the "C" locale it returns true
only for the characters in the basic execution character set.
========
[Item 41]
Category: Editorial
Committee Draft subsection: D.1
Title: Minor edit to clarify interpretation
Detailed description:
In D.1p2, change "to follow" to "to follow exactly". The point is not that
the annex is normative, but that it is to be applied "as is" rather than to
the exact letter.
========
[Item 42]
Category: Inconsistency
Committee Draft subsection: D.5
Title: Minor correction to an example in Annex D
Detailed description:
Replace D.5p9 with:
Clearly there is no undefined behavior.
[The existing text is clearly wrong.]
========
[Item 43, based on PC-UK0066]
Category: Inconsistency
Committee Draft subsection: various
Title: The term "access" is not well defined.
Detailed description:
The term "access" is not well defined. From context, it is most often used
to mean "read or write the value" but sometimes to mean "read the value".
This ambiguity sometimes makes it hard to understand what is actually meant.
It appears that some work on this has been done since CD1, and 6.7.3.1p5
makes it clear that "read or write" should be the meaning. However, this
does not obviously apply to the whole document; it ought to be made clear
and the remaining "read the value" uses changed.
Add a new subclause to clause 3:
3.X
access
(in the context of execution-time actions) to read or modify the
value of an object; expressions that are not evaluated do not access
objects.
NOTE 1 Where only one of these two actions is meant, the term "read"
or "modify" is used.
NOTE 2 The term "modify" includes the case where the new value of the
object is the same as the previous value.
Delete the following words from 6.7.3.1p5:
An access to a value means either fetching it or modifying it;
expressions that are not evaluated do not access values.
The following uses of "access" or its inflections need to be changed:
6.2.6.1p4 ("accessed" -> "read")
6.5p2 ("accessed" -> "read")
Footnote 68 ("accessing" -> "reading")
6.5.16.1p3 ("accessed" -> "read")
and the corresponding bullets in annex K.