JTC1/SC22/WG14
N868
ISO/IEC JTC 1/SC22
Programming languages, their environments and system software interfaces
Secretariat: U.S.A. (ANSI)
ISO/IEC JTC 1/SC22
N2872
TITLE:
Summary of Voting on Final CD Ballot for FCD 9899 - Information technology
- Programming languages - Programming Language C (Revision of ISO/IEC
9899:1990
DATE ASSIGNED:
1999-01-12
BACKWARD POINTER:
N/A
DOCUMENT TYPE:
Summary of Voting
PROJECT NUMBER:
JTC 1.22.20.01
STATUS:
WG14 is requested to prepare a Disposition of Comments Report and make a
recommendation on the further processing of the FCD.
ACTION IDENTIFIER:
FYI
DUE DATE:
N/A
DISTRIBUTION:
Text
CROSS REFERENCE:
SC22 N2794
DISTRIBUTION FORM:
Def
Address reply to:
ISO/IEC JTC 1/SC22 Secretariat
William C. Rinehuls
8457 Rushing Creek Court
Springfield, VA 22153 USA
Telephone: +1 (703) 912-9680
Fax: +1 (703) 912-2973
email: rinehuls@access.digex.net
_________ end of title page; beginning of voting summary ___________
SUMMARY OF VOTING ON
Letter Ballot Reference No: SC22 N2794
Circulated by: JTC 1/SC22
Circulation Date: 1998-08-24
Closing Date: 1999-01-08
SUBJECT: Final CD Ballot for FCD 9899 - Information technology -
Programming languages - Programming Language C (Revision of
ISO/IEC 9899:1990
----------------------------------------------------------------------
The following responses have been received on the subject of approval:
"P" Members supporting approval
without comments 8
"P" Members supporting approval
with comments 4
"P" Members supporting approval
with comments not yet received 1
"P" Members not supporting approval 3
"P" Members abstaining 2
"P" Members not voting 4
"O" Members supporting approval
without comments 1
--------------------------------------------------------------------
Secretariat Action:
WG14 is requested to prepare a Disposition of Comments Report and make a
recommendation on the further processing of the FCD.
The comment accompanying the abstention vote from Austria was: "Lack of
expert resources."
The comments accompanying the affirmative votes from Canada, France,
Norway and the United States of America are attached, along with the
comments accompanying the negative votes from Denmark, Japan and the
United Kingdom.
Germany has advised that the comments accompanying their affirmative vote
"will follow within the next ten days". Upon receipt, those comments will
be distributed as a separate SC22 document.
_____ end of voting summary; beginning of detailed summary __________
ISO/IEC JTC1/SC22 LETTER BALLOT SUMMARY
PROJECT NO: JTC 1.22.20.01
SUBJECT: Final CD Ballot for FCD 9899 - Information technology - Programming
languages - Programming Language C (Revision of ISO/IEC 9899:1990)
Reference Document No: N2794 Ballot Document No: N2794
Circulation Date: 1998-08-24 Closing Date: 1999-01-08
Circulated To: SC22 P, O, L Circulated By: Secretariat
SUMMARY OF VOTING AND COMMENTS RECEIVED
Approve Disapprove Abstain Comments Not Voting
'P' Members
Australia (X) ( ) ( ) ( ) ( )
Austria ( ) ( ) (X) (X) ( )
Belgium ( ) ( ) ( ) ( ) (X)
Brazil ( ) ( ) (X) ( ) ( )
Canada (X) ( ) ( ) (X) ( )
China (X) ( ) ( ) ( ) ( )
Czech Republic (X) ( ) ( ) ( ) ( )
Denmark ( ) (X) ( ) (X) ( )
Egypt ( ) ( ) ( ) ( ) (X)
Finland (X) ( ) ( ) ( ) ( )
France (X) ( ) ( ) (X) ( )
Germany (X) ( ) ( ) (*) ( )
Ireland (X) ( ) ( ) ( ) ( )
Japan ( ) (X) ( ) (X) ( )
Netherlands (X) ( ) ( ) ( ) ( )
Norway (X) ( ) ( ) (X) ( )
Romania ( ) ( ) ( ) ( ) (X)
Russian Federation (X) ( ) ( ) ( ) ( )
Slovenia ( ) ( ) ( ) ( ) (X)
Ukraine (X) ( ) ( ) ( ) ( )
UK ( ) (X) ( ) (X) ( )
USA (X) ( ) ( ) (X) ( )
'O' Memberz Voting
Korea Republic (X) ( ) ( ) ( ) ( ) ( )
* The Germany Member Body has advised that "comments will follow within
the next ten days". Upon receipt, these comments will be distributed
as a separate SC22 document.
--------- end of detailed summary; beginning of Canada Comments _____
From: "Doug Langlotz" (dlanglots@scc.ca)
Document number FCD 9899 (JTC 1/SC22/N2794)
Canada APPROVES WITH COMMENTS.
Canada supports approval with the following comments.
Comments:
Comment #1
Category: Normative
Committee Draft Subsection: 6.8.4 and 6.8.5
Title:
Inconsistent scoping rules for compound literals and control
statements
Description:
In 6.8.5.3, the for statement was modified when incorporating mixed
declarations and code to limit the scope of the (possible)
declaration in clause-1. However, this makes the behaviou
inconsistent for compound literals which now have different scope
rules in a for statement then in the other control statements.
From example 8 in 6.5.2.5:
struct s { int i; }
int f (void)
{
struct s *p = 0, *q;
int j = 0;
while (j < 2)
q = p, p = &((struct s){j++});
return p == q && q->i == 1;
}
Note that if a for loop were used instead of a while loop, the
lifetime of the unnamed object would be the body of the loop
only, and on entry next time around p would be pointing to an
object which is no longer guaranteed to exist, which would result
in undefined behaviour.
The behaviour of compound literals should be made consistent by
making all of the control statements have the same scoping rules
as for loops.
Comment #2
Category: Normative
Committee Draft Subsection: 6.5.2.5
Title:
Compound literals constraint #2
Description:
Constraint #2 in 6.5.2.5 seems to have an undesirable interpretation.
The constraint is, "No initializer shall attempt to provide a value
for an object not contained within the entire unnamed object specified
by the compound literal." This seems to disallow the following
(assume the Fred type has 3 members and the George type has 2 so in
neither case are we going past the end of the object):
(Fred){1, 7, &((George){5, 6})}
when it was really meant to disallow:
(int[2]){1, 2, 3}
Perhaps the rule could be broken down into more explicit cases: no
subscript (implicit or explicit) should be beyond the bounds of the
array object that it is applied to; no more fields in a struct object
should be initialized than there are fields in that struct; only
names of members of the struct or union object being initialized may
be used for designated initializers.
Similar wording is also used in 6.7.8.
Comment #3
Category: Normative
Committee Draft Subsection: 6.3.1.3
Title:
Converting to signed integral type (based on previous Canadian
comment)
Description:
Original Comment:
Section 6.3.1.3 paragraph 3 describes the result of converting a
value with integral type to a signed type which cannot represent the
value.
It says that the result is implementation defined, however, we
believe that the result should be undefined, analogous to the case
where an operation yields a value which cannot be represented
by the result type.
The purpose of this comment was to ensure that if converting a value
with integral type to a signed type which cannot represent the value
the implementation is allowed to terminate or allowed to fail to
translate.
Details from a note sent to the reflector:
I would claim that the use of "implementation defined" isn't
appropriate in 6.3.1.3 paragraph 3 for several reasons:
1. (possibly pedantic) The draft standard does not provide two or
more choices as required by 3.19. What is an implementation
allowed to do? Is termination legal? Failure to translate
ruled out.
2. I interpret section 4 paragraph 3 to forbid an implementation from
failing to translate because of an overflow during a conversion to
a signed integral type. Yet this would seem to be quite
appropriate. For example, an implementation should be *allowed*
to treat the following external definition as an error ("fail to
translate"), assuming that INT_MAX is not representable in short:
short big = INT_MAX;
Contrast that with the following, for which the implementation
*is* allow allowed to fail to translate:
short big = INT_MAX + 1;
3. An argument similar to 2 can be made that a run-time conversion
"problem" should be allowed to be treated as an error. It isn't
clear to me whether the definition of "implementation-defined"
allows for such an interpretation. There are strong hints in the
draft standard that lead me to think that it is not the
committee's intent to allow such an interpretation. I don't
actually see that this is spelled out in the draft standard.
4. In certain cases 6.3.1.3 paragraph 3 and 6.5 paragraph 5 actually
seem to conflict. For example, I would suggest that in the
following example, both apply.
short big = (short) INT_MAX;
Clearly 6.3.1.3 paragraph 3 applies, since a conversion to a
signed integral type is being performed, and the value cannot be
represented in it. So the result must be implementation-defined.
Clearly 6.5 paragraph 5 applies since the value of the expression
is not in the range of representable values for its type. So the
behavior is undefined.
This is seems to me to be a contradiction. A simple fix is to
change 6.3.1.3 paragraph 3 to call for "undefined behavior".
5. Although the wording in the draft is very similar to that in the
previous standard, there is a difference. I interpreted the old
standard's looser definition of "implementation defined" to allow
"failure to translate". This latitude is no longer available
in the draft.
6. I feel that it is an error to try to represent an out of range
value in an integral type. Yet "implementation defined" implies
that this is *not* an error (see section 4 paragraph 3). It is
in the interest of users that implementations be allowed to treat
it as an error. I admit that requiring that it be "caught" at
runtime would have a serious performance impact for many
implementations.
I'm only asking that the standard should allow this error to be
caught.
The standard should require that constant expressions with signed
integer overflow should be constraint violations. This has no
runtime cost. If this isn't acceptable, at least allow this
situation to be treated as an error ("undefined behavior"
accomplishes this).
Excerpts from 3. Terms and definitions:
3.11
1 implementation-defined behavior
unspecified behavior where each implementation documents how
the choice is made
2 EXAMPLE An example of implementation-defined behavior is the
propagation of the high-order bit when a signed integer is
shifted right.
3.19
1 unspecified behavior
behavior where this International Standard provides two or
more possibilities and imposes no requirements on which is
chosen in any instance
2 EXAMPLE An example of unspecified behavior is the order in
which the arguments to a function are evaluated.
Excerpt from 4. Conformance:
3 A program that is correct in all other aspects, operating on
correct data, containing unspecified behavior shall be correct
program and act in accordance with 5.1.2.3.
[5.1.2.3 is "Program execution"]
All of 6.3.1.3 Signed and unsigned integers:
1 When a value with integer type is converted to another integer
type other than _Bool,if the value can be represented by the
new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted
by repeatedly adding or subtracting one more than the maximum
value that can be represented in the new type until the value
is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be
represented in it; the result is implementation-defined.
Excerpt from 6.5 Expressions:
5 If an exception occurs during the evaluation of an expression
(that is, if the result is not mathematically defined or not in
the range of representable values for its type), the behavior
is undefined.
Comment #4
Category: Normative
Committee Draft Subsection: 6.2.5
Title:
Restrictions on long long
Description:
Proposal for a change to the Draft C standard (WG14/N843)
This proposal suggests a collection of small changes to the Draft C
Standard (WG14/N843) dated August 3, 1998. The changes are intended
to isolate long long int and implementation-defined extended integer
types from the common integer types. In particular, we wish to
ensure that size_t and ptrdiff_t must be one of the common integer
types, rather than long long or an implementation-defined extended
integer type. Also, we wish to ensure that no values are converted to
long long or an implementation-defined extended integer type,
except when the conversion is explicit. For example, on a system
where integers have 32 bits, a constant like 0xFFFFFFFF should be
converted to unsigned long rather than long long.
In order to implement this principle, we suggest the following
wording changes to various sections in the draft document.
6.2.5 Types
4. There are four standard signed integer types, designated as
signed char, short int, int, and long int. (These and other types
may be designated in several additional ways, as described in
6.7.2.) There is one standard extended signed integer
type designated as long long int. There may be additional
implementation-defined extended signed integer types. The standard
extended signed integer type and the implementation-defined
extended signed integer types are collected called the extended
signed integer types. The standard and extended signed integer
types are collectivel called signed integer types.
7.17 Common definitions <stddef.h>
<<Add to end of paragraph 2>>None of the above three types shall be
defined with an extended integer type, whether standard or
implementation-defined.
________end of Canada Comments; beginning of Denmark Comments_________
From: Charlotte Laursen <cl@ds.dk>
Dato: 1999-01-06
Danish vote on FCD 9899
Ballot comments on FCD 9899, C, SC22 N2794
The DS vote is negative
The vote can be changed into an affirmative one if the following
problems are resolved satisfactorily.
1. Two functions isblank(), iswblank() are added with the description
contained in CD 9899, SC22 N2620.
2. The external linkage limit is set to 32 *characters* and that
a note that a character may be needed to be represented by more
than one byte will be added.
3. The character terminology and its use be tidied up, and brought
in consistence with SC2 terminology. Eg 3.5, 3.14 .
The ISO/IEC 9945 (POSIX) series of standards have done a related exercise.
_____ end of Denmark Comments; beginning of France Comments _________
From: ARNAUD.A.R.D.DIQUELOU@email.afnor.fr
TITLE: Ballot comments on SC22N2794 - FCD 9899 - C
STATUS: Approved AFNOR comments
France votes YES to document SC22 N2794, with the following comments.
A. First, it should be noted that with one (important) exception,
the points raised in the precedent vote (N2690, answered by N2792)
was satisfactorily resolved. The overall impression is that the
document have been vastly improved in the ongoing process of revision.
B. Then, there is a technical point related to <time.h> that we
miss at the first time, and about which we propose to drop all
the new material in this area, waiting for experts (that have
already set up a working group) to design a better solution.
This point is detailled below.
C. Finally, there is long long.
Our position on this subject do not change: we feel this feature is
not necessary to be included in C9X (see the fully detailled analysis
in AFNOR previous ballot comments in SC22 N2690).
The Committee answered PR at the preceding vote (i.e. "has reaffirmed
this decision on more than one occasion.")
On the other hand, we see this problem as being very minor when
compared to the goal of delivering a new revision of the C Standard,
with the added precisions and new features this draft is proposing.
So we would not require anymore this feature to be removed from
the draft (but we would be very happy if it happens).
AFNOR detailled comments related to <time.h>
Comment 1.
Category: Feature that should be included
Committee Draft subsection: 7.23, 7.25, 7.26
Title: Useless library functions made deprecated
Detailed description:
mktime (7.23.2.3) is entirely subsumed by mkxtime (7.23.2.4).
Similar cases occur with gmtime/localtime (7.23.3.3/7.23.3.4)
vs zonetime (7.23.3.7, strftime (7.23.3.5) vs strfxtime
(7.23.3.6), and wcsftime (7.24.5.1) vs wcsfxtime (7.24.5.2).
The former functions do not add significant value over the latter
ones (in particular, execution time are similar). So if the
latter are to be kept (that is, if the below comment is dropped),
the former should be put in the deprecated state, to avoid
spurious specifications to be kept over years.
Comment 2.
Category: Feature that should be removed
Committee Draft subsection: 7.23, 7.24.5.2
Title: Removal of struct tmx material in the <time.h> library subclause
Detailed description:
a) The mechanism of tm_ext and tm_extlen is entirely new to the
C Standard, so attention should be paid to the use that can be
done of it. Unfortunately, the current text is very elliptic about
this use, particularly about the storage of the further members
referred by 7.23.1p5.
In particular, it is impossible from the current wording to know how
to correctly free a struct tmx object whose tm_ext member is
not NULL, as in the following snippet:
// This first function is OK (providing correct understanding of my behalf).
struct tmx *tmx_alloc(void) // alloc a new struct tmx object
{
struct tmx *p = malloc(sizeof(struct tmx));
if( p == NULL ) handle_failed_malloc("tmx_alloc");
memchr(p, 0, sizeof(struct tmx)); // initialize to 0 all members
p->tm_isdst = -1;
p->tm_version = 1;
p->tm_ext = NULL;
return p;
}
// This second function have a big drawback
void tmx_free(struct tmx *p) // free a previously allocated object
{
if( p == NULL ) return; // nothing to do
if( p->tm_ext ) {
// some additional members have been added by the implementation
// or by users' programs using a future version of the Standard
// since we do not know what to do, do nothing.
;
// If the members were allocated, they are now impossible to
// access, so might clobber the memory pool...
}
free(p);
return;
}
Various fixes might be thought of. Among these, I see:
- always require that allocation of the additional members be in
control of the implementation; this way, programs should never
"free() tm_ext"; effectively, this makes these additional members
of the same status as are currently the additional members that
may be (and are) part of struct tm or struct tmx
- always require these additional objects to be separately dynamically
allocated. This requires that copies between two struct tmx objects
should dynamically allocate some memory to keep these objects.
In effect, this will require additional example highlight this
(perhaps showing what a tmxcopy(struct tmx*, const struct tmx*)
function might be).
Both solutions have pros and cons. But it is clear that the current
state, that encompasses both, is not clear enough.
Other examples of potential pitfalls are highlighted below.
b) This extension mechanism might be difficult to use with
implementations that currently have additional members to struct tm
(_tm_zone, containing a pointer a string meaning the name of the time
zone, and _tm_gmtoff, whose meaning is almost the same as tm_zone,
except that it is 60 times bigger). This latter is particularly
interesting, since it might need tricky kludges to assure the internal
consistency of the struct tmx object (any change to either member
should ideally be applied to the other, yielding potential problems
of rounding). Having additional members, accessed though tm_ext,
for example one whose effect is to duplicate _tm_zone behaviour,
probably is awful while seen this way.
c) 7.23.1p5 states that positive value for tm_zone means that the
represented brocken-down time is ahead of UTC. In the case when
the relationship between the brocken-down time and UTC is not known
(thus tm_zone should be equal to _LOCALTIME), it is therefore
forbidden to be positive. This might deserve a more explicit
requirement in 7.23.1p2.
d) POSIX compatibility, as well as proper support of historical
time zones, will require tm_zone to be a count of seconds instead
of a count of minutes; this will in turn require tm_zone to be
enlarged to long (or to int_least32_t), to handle properly
the minimum requirements.
e) POSIX compatibility might be defeated with the restriction
set upon Daylight Saving Times algorithms to actually *advance*
the clocks. This is a minor point, since there is no historical
need, nor any perceived real need, for such a "feature".
f) On implementations that support leap seconds, 7.23.2.2
(difftime) do not specify whether the result should include
(thus considering calendar time to be effectively UTC) or
disregard (thus considering calendar time to be effectively
TAI) leap seconds. This is unfortunate.
g) The requirements set up by 7.23.2.3p4 (a second call to
mktime should yield the same value and should not modify the
brocken-down time) is too restrictive for mktime, because
mktime does not allow complete determination of the calendar
time associated with a given brocken-down time. Examples
include the so-called "double daylight saving time" that
were in force in the past, or when the time zone associated
with the time changes relative to UTC.
For example, in Sri Lanka, the clocks move back from 0:30
to 0:00 on 1996-10-26, permanently. So the timestamp
1996-10-26T00:15:00, tm_isdst=0 is ambiguous when given to
mktime(); and widely deployed implementations exist that use
caches, thus might deliver either the former of the later
result on a random basis; this specification will effectively
disallow caching inside mktime, with a big performance hit
for users.
This requirement (the entire paragraph) should be withdrawn.
Anyway, mktime is intended to be superseded by mkxtime, so
there is not much gain trying to improve a function that is
to be declared deprecated.
h) The case where mktime or mkxtime is called with tm_zone set
to _LOCALTIME and tm_isdst being negative (unknown), and when
the input moment of time is inside the "Fall back", that is
between 1:00 am and 2:00 am on the last Sunday in October
(in the United States), leads to a well known ambiguity.
Contrary to what might have been waited for, this ambiguity
is not solved by the additions of the revision of the Standard
(either results might be returned): all boiled down to the
sentence in 7.23.2.6, in the algorithm, saying
// X2 is the appropriate offset from local time to UTC,
// determined by the implementation, or [...]
Since there are two possible offsets in this case...
i) Assuming the implementation handle leap seconds, if brocken-
down times standing in the future are passed (where leap seconds
cannot de facto be determined), 7.23.2.4p4 (effect of _NO_LEAP_
SECONDS on mkxtime), and in particular the sentence between
parenthesis, seems to require that the count of leap seconds
should be assumed to be 0. This would be ill advised; I would
prefer it to be implementation-defined, with the recommended
practice (or requirement) of being 0 for implementations that
do not handle leap seconds.
j) Assuming the implementation handle leap seconds, the effect
of 7.23.2.4p4 is that the "default" behaviour on successive
calls to mkxtime yields a new, strange scale of time that is
neither UTC nor TAI. For example (remember that there will
be introduced a positive leap second at 1998-12-31T23:59:60Z
in renewed ISO 8601 notation):
struct tmx tmx = {
.tm_year=99, .tm_mon=0, .tm_mday=1, .tm_hour=0, .tm_min=0, .tm_sec=0,
.tm_version=1, .tm_zone=_LOCALTIME, .tm_ext=NULL,
.tm_leapsecs=_NO_LEAP_SECONDS }, tmx0;
time_t t1, t0;
double delta, days, secs;
char s[SIZEOF_BUFFER];
t1 = mkxtime(&tmx);
puts(ctime(&t1));
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
tmx0 = tmx; // !!! may share the object pointed to by tmx.tm_ext...
++tmx.tm_year;
t2 = mkxtime(&tmx);
puts(ctime(&t2));
delta = difftime(t2, t1);
days = modf(delta, &secs);
printf("With ++tm_year: %.7e s == %f days and %f s\n", delta, days, secs);
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
tmx = tmx0; // !!! may yield problems if the content pointed to by
// tm_ext have been modified by the previous call...
tmx.tm_hour += 24*365;
tmx.tm_leapsecs = _NO_LEAP_SECONDS;
t2 = mkxtime(&tmx);
puts(ctime(&t2));
delta = difftime(t2, t1);
days = modf(delta, &secs);
printf("With tm_hour+=24*365: %.7e s == %f days and %f s\n", delta, days, secs);
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
Without leap seconds support, results should be consistent and straight-
forward; like (for me in Metropolitan France):
Thu Jan 1 01:00:00 1998
Unable to determine number of leap seconds applied.
Fri Jan 1 01:00:00 1999
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
Unable to determine number of leap seconds applied.
Fri Jan 1 01:00:00 1999
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
Unable to determine number of leap seconds applied.
Things may change with leap seconds support; assuming we are in a
time zone behind UTC (e.g. in the United States), the results might be:
Wed Dec 31 21:00:00 1997
tmx.tm_leapsecs = 31
Thu Dec 31 21:00:00 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Thu Dec 31 21:00:00 1998
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
But with a time zone ahead of UTC, results might be
Thu Jan 1 01:00:00 1998
tmx.tm_leapsecs = 31
Thu Dec 31 00:59:59 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Fri Jan 1 01:00:00 1999
With tm_hour+=24*365: 3.1536001e+07 s == 365 days and 1 s
tmx.tm_leapsecs = 32
And if the time zone is set to UTC, results might be
Thu Jan 1 00:00:00 1998
tmx.tm_leapsecs = 31
Thu Dec 31 23:59:60 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Fri Jan 1 00:00:00 1999
With tm_hour+=24*365: 3.1536001e+07 s == 365 days and 1 s
tmx.tm_leapsecs = 32
or, for the three later lines
Thu Dec 31 23:59:60 1998
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
The last result is questionable, since both choices are allowed by
the current text (the result is right into the leap second itself).
Moreover, implementations with caches might return either on a
random basis...
Bottom line: the behaviour is surprising at least.
k) 7.23.2.6p2 (maximum ranges on input to mkxtime) use LONG_MAX
sub multiples to constraint the members' values. Outside the fact
that the limitations given may easily be made greater in general
cases, it have some defects:
- the constraint disallow the common use on POSIX box of tm_sec
as an unique input member set to a POSIX style time_t value;
- the constraints are very awkward to implementations where
long ints are bigger than "normal" ints: on such platforms, all
members should first be converted to long before any operation
take place;
- since there are eight main input fields, plus a ninth (tm_zone)
which is further constrained to be between -1439 and +1439, the
result might nevertheless overflows, so special provision to
handle should take place in any event.
l) There is an obvious (and already known) typo in the description
of D, regarding the handling of year multiple of 100. Also
this definition should use QUOT and REM instead of / and %.
m) Footnote 252 introduces the use of these library functions with
the so-called proleptic Gregorian calendar, that is the rules for
the Gregorian calendar applied to any year, even before Gregory's
reform. This seems to contradict 7.23.1p1, which says that calendar
time's dates are relative to the Gregorian calendar, thus tm_year
should be in any case greater to -318. If it is the intent, another
footnote in 7.23.1p1 might be worthwhile. Another way is to rewrite
7.23.1p1 something like "(according to the rules of the Gregorian
calendar)". See also point l) below.
n) The static status of the returned result of localtime and gmtime
(which is annoying, but that is another story) is clearly set up
by 7.23.3p1. However, this does not scale well to zonetime, given
that this function might in fact return two objects: a struct tmx,
and an optional object containing additional members, pointed to
by tmx->tm_ext.
If the later is to be static, this might yield problems with mkxtime
as well, since 7.23.2.4p5 states that the brocken-down time "produced"
by mkxtime is required to be identical to the result of zonetime
(This will effectively require that tm_ext member should always
point to static object held by the implementation; if it is the
original intent, please state it clearly).
o) There is a direct contradiction between the algorithm given for
asctime in 7.23.3.1, which will overflow if tm_year is not in the
range -11899 to +8099, and the statements in 7.23.2.6 that intent
to encompass a broader range.
All of these points militate to a bigger lifting of this part
of the library. Such a job have been initiated recently, as
Technical Committee J11 is aware of. In the meantime, I suggest
dropping all these new features from the current revision of
the Standard.
It means in effect:
i) removing subclauses 7.23.2.4 (mkxtime), 7.23.2.6 (normalization),
7.23.3.6 (strfxtime), 7.23.3.7 (zonetime), 7.24.5.2 (wcsfxtime).
ii) removing paragraphs 7.23.2.3p3 and 7.23.2.3p4 (references to
7.23.2.6),
iii) the macros _NO_LEAP_SECONDS and _LOCALTIME in 7.23.1p2 should be
removed, as they are becoming useless. Same holds for struct tmx
in 7.23.1p3.
iv) 7.23.1p5 (definition of struct tmx) should also be removed, as it
is becoming useless too.
__________ end of France Comments; beginning of Japan Comments ______
VOTE ON FCD 9899
JTC 1/SC22 N2794
Information technology - Programming languages -
Programming Language C (Revision of ISO/IEC 9899:1990)
----------------------------------------------------------
The National Body of Japan disapproves ISO/IEC JTC1/SC22 N 2794, VOTE ON
FCD 9899: Information technology - Programming languages - Programming
Language C (revision of ISO/IEC 9899:1990.)
If the following comments are satisfactorily resolved, Japan's vote will
be changed to "Yes".
1. 64-bit data type should be optional
Japan has been claiming a 64-bit data type to be optional for
freestanding environment because it tends to increase the sizes of
program code and run-time library. This situation forces users and
vendors, who do not need to handle any 64-bit data, to implement and use
unnecessary fat conforming implementation. For example, for 16-bit
microprocessors, its compiler vendors and users do not need 64-bit data
type in almost cases.
Of course, Japan knows well that this issue has been discussed among WG14
committee for a long time, and recognizes that a majority of WG14
committee agrees to introduce 64-bit data type as a mandatory
specification to ISO C standard. So, Japan decided to agree to make 64-bit
data type as a mandatory specification, provided that the following
requirements are accepted by the committee:
a) In the Rationale document, explicitly specify the logical reason why
the 64-bit data type need to be introduced as a MANDATORY specification
to the ISO C standard, in other words, why the 64-bit data type can NOT
be OPTIONAL. Please understand that this is NOT requiring the reason why
64-bit data type is necessary. Japan already understands the necessity of
64-bit data type very well. Japan needs the logical reason why the
committee has denied the proposal of "OPTIONAL".
b) In the Rationale document, explicitly specify an effective example of
conforming implementation which is supporting mandatory 64-bit data type
and can reduce a code size of the program to similar (never lager) size
in existing C90 conforming implementation if the program does not use
64-bit data at all.
c) By some appropriate manner, publish the Rationale document that
includes the above two descriptions.
2. "K.4 Locale-specific behavior" and the wcscoll function
@ Annex K.4
The wcscoll function, defined in sub-clauses 7.24.4.4.2, has a
locale-specific behavior, therefore, add the reference of the wcscoll
function to Annex K.4.
3. Modify description for the conversion specifiers (e, f, g, a)
@ 7.19.6.1 fprintf function
@ 7.24.2.1 fwprintf function
The description of the conversion specifiers f, F, e, E, g, G, a and A
has the term "a (finite) floating-point number." This kind of expression
using the parenthesizes, "(finite)", is not appropriate as a
specification of the programming language standard. It should be changed
to a more strict description by using well-defined terms. Japan's
concrete proposal was already presented in the comment attached to the
vote of CD Approval. Please refer to SC22 N 2790(SoV of CD vote), and N
2792 (Disposition of CD vote comments).
[More explanation of this issue]
A floating-point number is defined in sub-clause "5.2.4.2.2
characteristics of floating types <float.h>". In "5.2.4.2.2", the
floating-point number seems to be categorized as follows:
floating-point number
+
|
+---- normalized floating-point number
|
+---- not normalized floating-point number
+
|
+---- subnormal floating-point number
|
+---- infinities
|
+---- NaNs
|
+---- ...
That is, infinities and NaNs can be interpreted as one category of
"floating-point number". (In the standard, there is no explicit
description describing that infinities and NaNs are NOT floating-point
numbers.)
On the other hand, the footnote 15 says that "although they are stored
in floating types, infinities and NaNs are not floating-point numbers",
however, the footnote is NOT the normative part of the standard. In order
to make the standard clearer, this description should be moved to the
normative part of the standard. That way, removing the word (finite)" is
enough change to the current description for the conversion specifiers
(e, f, g, a). However, if the committee leaves the footnote 15 as is, the
description for e, f, g, and a should be changed as already Japan
proposed.
-----
Cf. SC22 N 2792
Disposition of Comments Report on CD Approval of CD
9899 - Information technology - Programming languages
- Programming Language C (Revision of ISO/IEC
9899:1990)
> 3.8 A double argument for the conversion specifier
>
> > 2. Editorial Comments
> > 14) A double argument for the conversion specifier
> >
> > Sub clause 7.12.6.1 (page 232 - 233 in draft 9) and
> > sub clause 7.18.2.1 (page 308 - 309 in draft 9):
> >
> > In the description about the conversion specifiers f, F, e,
> > E and G of the function f[w]printf,
> > "a double argument representing a floating-point number
> > is..."
> > should be changed to
> > "a double argument representing a normalized
> > floating-point number is..." ^^^^^^^^^^
> > in order to clarify the range and the definition of the
> > double argument.
> >
> > WG14: The Committee discussed this comment, and came to the
> > consensus that this is not an editorial issue, some
> > floating point arithmetic support denormal numbers and
> > infinities.
> > There will need to be a detailed proposal to support
> > this change.
>
> The original intention of Japanese comment is to point out
> that the current description:
>
> "A double argument representing a floating number is
> converted to ...[-]ddd.ddd...
> A double argument representing an infinity is converted
> to ...[-]inf or [-]infinity
> A double argument representing a NaN is converted to ...
> [-]nan or [-]nan(n-char-sequence)..."
>
> is not appropriate as a strict language standard
> specification because "a floating-point number" (defined in
> "5.2.4.2.2 Characteristics of floating types <float.h>"),
> as WG14 mentions above, may include an infinity and a Nan
> so that the current description can be read as an infinity
> can be converted to [-]ddd.ddd or [-]inf and also NaN can be
> converted to [-]ddd.ddd or [-]Nan.
>
> Therefore, Japan re-proposes to change the above description
> to:
>
> "A double argument representing an infinity is converted
> to ...[-]inf or [-]infinity...
> A double argument representing a NaN is converted to ...
> [-]nan or [-]nan(n-char-sequence)..."
> A double argument representing a floating number except
> an infinity and a NaN is converted to ...[-]ddd.ddd..."
>
> This change should be applied to the description about the
> conversion specifier f, F, e, E and G of the function
> f[w]printf().
>
> WG14: Response Code: AL
-----
4. Add UCN to the categories of preprocessing token
@ 6.4 Lexical elements, 1st paragraph in the semantics
Add UCN to the categories of the preprocessing token as described in the
syntax.
5. Replace zero code with null character in mbstowcs
@ 7.20.8.1 The mbstowcs function, Returns
"Terminating zero code" should be changed to a well-defined term:
"Terminating null character (defined in 5.2.1)"
Cf. "5.2.1" says "a byte with all bits set to 0, called the null
character, ..."
6. Necessary rationale for the changes of some environmental limits
@ "7.13.6.1 The fprintf function" Environmental limit
The minimum value for the maximum number of characters produced by any
single conversion is changed from 509 (in the current ISO/IEC 9899:1990)
to 4095. And also, some of other environmental limits are changed from
ISO/IEC 9899:1990. Please describe a clear rationale for these changes.
The above change request had already presented as a part of Japan's
comments attached to CD approval vote. The disposition of this comment by
the committee was as follows:
"This was accepted as an editorial change to the Rationale." (Please
refer SC22 N 2792.) However, the latest drafted Rationale SC22/WG14 N
850 does not have any description about the change of environmental limit
pointed by Japan's comment. Therefore, Japan decided to present the same
comment as already submitted at CD approval ballot.
7. Mathematical notations
@ 7.12 Mathematics <math.h>
Many kinds of mathematical notations are used in sub clause 7.12,
including unfamiliar one. Are all of these notations defined in some ISO
standard, or any other standard? If so, please add the document name and
number to "2. Normative references." If not, please add explanations and
definitions of each mathematical notation to the annex.
8. The nearbyint function: Add reference to Annex F
@ 7.12.9.3 the nearbyint functions, Description
Add the reference "(see F.9.6.3 and F.9.6.4)" to the description of the
nearbyint function as rint functions.
9. Inappropriate sentences
@ 7.3.1 Introduction, 5th paragraph
[#5] Notwithstanding the provisions of 7.1.3, a program is permitted to
undefine and perhaps then redefine the macros complex, imaginary, and I.
@ 7.16 Boolean type and values <stdbool.h>, 4th paragraph
[#4] Notwithstanding the provisions of 7.1.3, a program is permitted to
undefine and perhaps then redefine the macros bool, true, and false.
The sentence "Notwithstanding... a program is permitted to undefine and
perhaps then..." is not appropriate to describe the programming language
standard. Please modify the description by using well defined verbs or
auxiliary verbs, e.g. "shall", "may" and so on. For example, "a program
may undefine and then redefine..." seems to be appropriate.
10. Change the definition of active position into the original words of
C90
@ 5.2.2 Character display semantics, 1st paragraph
France changed the description of the active position along the comment
as follows:
[#1] The active position is that location on a display device where the
next character output by the fputc or fputwc functions would appear.
The intent of writing a printable character (as defined by the isprint or
iswprint function) to a display device is to display a graphic
representation of that character at the active position and then advance
the active position to the next position on the current line.
This change lacks careful considerations about a treatment of a
character, a wide character, a byte-oriented stream and a wide-oriented
stream. It is necessary to distinguish carefully a character from a wide
character, and a byte-oriented stream from a wide-oriented stream. It is
also necessary to consider a mixed-up stream of byte-oriented and
wide-oriented. As a result of the easy change, the above description is
including incorrect descriptions. One of the examples of incorrect
descriptions is "a printable character (as defined by ... iswprint
function.)" The iswprintf defines the printable *wide* character, not the
printable character.
Therefore, the sentence should be changed back into the original words by
removing "fputwc" and "iswprint" as follows:
[#1] The active position is that location on a display device where the
next character output by the fputc function would appear. The intent of
writing a printable character (as defined by the isprint function) to a
display device is to display a graphic representation of that character
at the active position and then advance the active position to the next
position on the current line.
If the committee wants to change the definition of the active position
from C90, it is necessary more deep discussions about the character
issues, and need the agreement with all of the national member bodies.
____ end of Japan Comments; beginning of Norway Comments ___________
--------------------------------------------------------------------
ISO/IEC JTC 1/SC22
Title: Programming languages, their environments and system software
interfaces
Secretariat: U.S.A. (ANSI)
---------------------------------------------------------------------
Please send this form, duly completed, to the secretariat indicated above.
---------------------------------------------------------------------
FCD 9899
Title: Information technology - Programming languages - Programming
Language C (Revision of ISO/IEC 9899:1990)
-----------------------------------------------------------------------
Please put an "x" on the appropriate line(s)
Approval of draft
_____ as presented
__X__ with comments as given below (use separate page as annex, if
necessary
_____ general
__X__ technical
_____ editorial
_____ Disapproval of the draft for reasons below (use separate page as
annex, if necessary
_____ Acceptance of these reasons and appropriate changes in the text
will change our vote to approval
_____ Abstention (for reasons below
--------------------------------------------------------------------
P-member voting: NORWAY
Date: 1999-01-08
Signature (if mailed): ULF LEIRSTEIN
---------------------------------------------------------------------
Comment 1.
Category: Correction restoring original intent
Committee Draft subsection: 5.1.1.2
Title: Translation phases 2 and 4 should be allowed to produce \\u....
In phase 2 and 4, behavior is explicitly undefined if new-line removal
or token concatenation produces a sequence like \\u0123, e.g.
printf("\\u\
0123");
I assume this code is intended to be legal, since that \u0123 is not a
universal character name.
The same can happen with token concatenation in phase 4, though I'm not
sure if otherwise well-defined code can produce such sequences. E.g.
#define CAT(a,b) a##b
CAT(a\\u, 0123)
Suggested change (though I hope a more elegant wording is found):
In phase 2 (and 4?), after
character sequence that matches the syntax of a universal
character name
append
and is not immediately preceded by an uneven number of
backslashes
On the other hand, is it supposed to be OK for backslash-newline
removal to turn a UCN-look-alike into a non-UCN?
printf("\\
\u0123");
I don't know if it can cause trouble for a C compiler, but it can
confuse tools that process C program lines in a "state-less" way.
------------------------------------------------------------------------
Comment 2.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.4.9
Title: Remove `//*' quiet change.
The `//*' quiet change in the Rationale, also illustrated by the last
example of paragraph 6.4.9, is unnecessary:
m = n//**/o
+ p; // equivalent to m = n + p;
(This meant m = n / o + p; in C89.)
Suggested change to 6.4.9:
New constraint: The first (multibyte) character following the `//'
in a `//'-style comment shall not be `*'.
Paragraph 2: Append .."and to see if the initial character is `*'".
Paragraph 3, last example: Change the note to " // syntax error".
If this is rejected, a milder change could be a `Recommended Practice'
section which recommends to warn about `//*'.
------------------------------------------------------------------------
Comment 3.
Category: Feature that should be included
Committee Draft subsection: 6.10, 6.10.3
Title: Allow `...' anywhere in a macro parameter list.
Parameters after the `...' in a macro could be useful. The preprocessor
does not need to forbid it, since the number of arguments is known when
the macro is expanded. Example:
extern void foo(char *arg, ...); /* Argument list ends with NULL */
#ifdef ENCLOSE
# define foo(..., null) (foo)("{", __VA_ARGS__, "}", null)
#endif
Suggested changes follow.
In 6.10 and A.2.3, add
macro-parameter-list:
identifier-list
identifier-list , ...
identifier-list , ... , identifier-list
...
... , identifier-list
and replace
# define identifier lparen identifier-list-opt )
replacement-list new-line
# define identifier lparen ... ) replacement-list new-line
# define identifier lparen identifier-list , ... )
replacement-list new-line
with
# define identifier lparen macro-parameter-list-opt )
replacement-list new-line
In 6.10.3 paragraph 4, replace
If the identifier-list in the macro definition does not
end with an ellipsis,
with
If the macro-parameter-list does not contain an ellipsis,
and replace `identifier-list' with `macro-parameter-list'.
In paragraph 10, replace
The parameters are
specified by the optional list of identifiers, whose scope
extends from their declaration in the identifier list
with
The parameters are
specified by the optional macro-parameter-list. Their scope
extends from their declarations in the parameter list
In paragraph 12, replace
If there is a ... in the identifier-list in the macro
definition, then the trailing arguments,
with
If there is a ... in the macro-parameter-list, then
arguments from the position matching the ... ,
In the rest of 6.10.3, replace `identifier list' with `parameter list'.
------------------------------------------------------------------------
Comment 4.
Category: Feature that should be included
Committee Draft subsection: 6.10.3
Title: Support empty __VA_ARGS__ by adding __VA_COMMA__
Empty __VA_ARGS__ are not currently allowed, though there are things
they could express that the current definition cannot handle. Example:
#define Msg(type, ...) printf(format[type], __VA_ARGS__)
Suggested change:
1. Allow `...' to receive no arguments.
2. Define an implicit macro parameter __VA_COMMA__ or maybe __VA_SEP__
which expands to `,' if there were arguments and <empty> otherwise.
This allows a possibly-empty __VA_ARGS__ to be used both in front
and at the end of a comma-separated list:
#define SEND(...) Send( __VA_ARGS__ __VA_COMMA__ (void*)0 )
#define Msg(type, ...) printf(format[type] __VA_COMMA__ __VA_ARGS__)
One negative effect is that in macros which need arguments to `...', the
error check for whether there are arguments is lost. The programmer
must supply an extra argument if he wants that check, and e.g. replace
`#__VA_ARGS__' with `#extra_arg #__VA_COMMA__ #__VA_ARGS__'.
That does not seem important, since there is no error check on the rest
of the arguments in any case. Besides, the error will usually cause a
syntax error in translation phase 7.
Still, a workaround could append a `?' to the `...' when the `...' may
receive an empty argument list, or (uglier in my opinion) only allow an
empty argument list if there is a __VA_COMMA__ in the replacement list.
I do not know if `foo(EMPTY)' below should expand to <empty> or `,':
#define foo(...) __VA_COMMA__
#define EMPTY
This does not seem important, so it could probably be standardized as
whatever is easiest to implement. I believe it should expand to `,'.
(It should agree with the __VA_COUNT__ proposal, if that is included.)
Changes to the standard:
6.10.3 paragraph 4: Replace
Otherwise, there shall be more arguments in the invocation than
with
Otherwise, there shall be at least as many arguments in the
invocation as
6.10.3 paragraph 5: Replace
The identifier __VA_ARGS__
with
The identifiers __VA_ARGS__ and __VA_COMMA__
6.10.3.1 - new paragraph 3:
An identifier __VA_COMMA__ that occurs in the replacement list shall
be treated as if it were a parameter. If no arguments were used to
form the variable arguments, __VA_COMMA__ shall receive an empty
argument. Otherwise, it shall receive one `,' token.
6.10.3.5p9: Add an example
#define run(cmd,...) execlp(cmd, cmd __VA_COMMA__ __VA_ARGS__, NULL)
run("man", "cc");
results in
execlp("man", "man" , "cc" , NULL);
------------------------------------------------------------------------
Comment 5.
Category: Correction, Request for information/clarification
Committee Draft subsection: various
Title: Clean up character and encoding concepts
The definitions of and relationships between characters, character
encodings, character sets and C types are scattered throughout the
Standard, and are difficult to figure out even after reading through it:
Seven concepts for characters and their encodings (plus C types):
character, extended character, multibyte character, generalized
multibyte character, wide character, wide-character code/value, byte.
Eight or nine "character sets":
source/execution ch.sets, basic source/execution ch.sets, extended
source/execution ch.sets, required source ch.set, encoding of physical
source files, (encoding of generalized multibyte characters).
Also, the word "character" is used for different concepts: *Encoded*
in bytes (like UTF-8 characters), encoded as a single byte, and
*enumerated* (as in iso-10646). I'm not sure if there are also
"abstract" characters (conceptual entities with a typical graphical
form, which I believe is the correct meaning of "character") in the
standad. This may be part of the reason for the confusion in
discussions about character sets and universal characters names.
Note that I don't know if my definitions above are quite correct.
As far as I can tell, the basic, extended and maybe required character
sets are enumerated, and the rest are encoded. Though if the source
character set is encoded, I don't know why translation phase 1 needs to
map another character set to that instead of to an enumerated set.
(The fact that most required characters can be encoded in one byte
doesn't mean the required set can be both encoded and enumerated - an
entity can't be both member of an encoded set and an enumerated set.)
The different character concepts should be spelled out in _one_ section
(5.2.1?), the unqualified word "character" ought to be used for (at
most) one of the concepts above, and at least the character sets that
use other concepts should be renamed - e.g. source character set ->
"encoded source character set or multibyte source character set.
This would often lead to cumbersome text, though. The text could be
simplified if this notation is added:
"Character" prepended with "basic", "extended", "source", "execution",
and/or "required" means member of the corresponding character set.
Thus, "basic source character" means "member of the basic source
character set".
If the "source/execution character sets" are renamed to "multibyte
source/execution character sets", the same rule can apply to the word
"multibyte" -- unless there exist multibyte characters that are not
members of the source/execution character sets.
Since the basic/extended character sets contain integer codes for
characters and the source/execution character sets contain multibyte
representations, I suggest that "characters" and "extended characters"
are clearly described as integers or numbered entities, and "bytes" and
"multibyte characters" as encodings.
In any case, please describe all the character types and sets above in
one section:
* which character concept they use,
* the relationship between them:
- which types and sets map to each other,
- which can be subsets or proper subsets of each other, and
which may contain members that do not map to any member in
others they relate to (e.g., I believe a byte may have a value
which does not map to the source/execution character set),
- maybe the character concepts' and character sets' relationship to
char, unsigned char, wchar_t and wint_t.
The character-related definitions in section 3 and their
descriptions/definitions in 5.2.1 should point to each other.
The reference to 5.2.1 in 6.2.5p3 must be updated or removed.
5.2.1 contains a jumbled list of encoded characters (in the
source/execution sets) and enumerated characters (in the basic
source/execution sets). I believe the current definition should
describe the basic/extended character sets, and possibly that the null
character is encoded as a null byte in the execution character set.
(The current words are incorrect, the basic execution character set
doesn't contain a null *byte*.)
7.19.6.1p17 and 7.19.6.2p24 incorrectly refer to "multibyte members of
the extended character set", and 7.20p3(MB_CUR_MAX) to "bytes in a
multibyte character for the extended character set". They should refer
to multibyte characters *representing* members of the extended character
set, or multibyte members of the (encoded) source/exec. character set.
One other detail: considering the definition of "byte" (3.4), the
"multibyte character" definition (3.14) means "sequence of one or more
addressable units...". I think the correct definition of byte is
"either <the current definition>, or bit representation which fits into
same".
Some suggested changes:
3.4 byte
either an addressable unit of data storage large enough to hold any
member of the basic character set of the execution environment,
or bit representation which fits exactly in a byte.
NOTE A byte has no value as such, but it can *encode* a value or
part of a value - e.g. a character. When the "value" of a byte is
mentioned without a related encoding, one usually means encoding
as a binary integer with no sign bit.
[ "no sign bit" (or unsigned, but that indicates C unsigned semantics)
so 0 can't mean "all bits 1" on 1's complement machines. ]
3.5 character
code value (a binary encoded integer [with no sign bit?]) that fits
in a byte
[ this is equivalent to 7.1.1p4 "wide character", except I wasn't sure
if "... that corresponds to a member of the basic character set" could
be included. May a byte have a value which is not a valid character?
May a character have a value which isn't in the basic character set? ]
or a multibyte character or wide character, if this is clear from
the context.
[ E.g. `the null character' and `space character' often means multibyte
characters, in fact null character is defined that way in 5.2.1 -
it is a member of an encoded character set. ]
About source files:
Translation phase 1 maps end-of-line in the physical source character
set to newline in the source character set. Thus, the source character
set (and the required source character set) in 5.2.1 must include
newline. Though it's probably good to keep the reminder in 5.2.1 that
physical source files may have some other representation of end-of-line.
5.2.1.2p1 says that for the source and execution character sets,
-- A byte with all bits zero shall be interpreted as a
null character independent of shift state.
-- A byte with all bits zero shall not occur in the second
or subsequent bytes of a multibyte character.
I believe this means an 8-bit host must either map an UCS-4 physical
source file to e.g. a UTF-8 source character set (the opposite of what a
normal Unicode application would do), or define the compiler to be
running on an emulated machine with 32-bit bytes. If so, why force the
implementation through such games?
However, I suspect the term "source file" in most places except 5.1.1.2
must be read as "the source after translation phase 1", or maybe "after
mapping to the source character set" (possibly except the mapping of
end-of-line to newline).
Maybe the simplest fix is to replace most occurrences of "source file"
with "source code", and define that as suggested above.
And/or move 1st part of translation phase 1 to a new phase 0, and define
the language in terms of phase 0 output.
Usually one can't misunderstand without really trying to, but --
5.2.1 defines the source character set as the encoding of characters
in source files, this definition needs to be correct.
5.2.1.2p2 may contain an exception:
For source files, the following shall hold:
-- An identifier, comment, string literal, character
constant, or header name shall begin and end in the
initial shift state.
I'm not familiar with shift states: Whether or not this is before or
after phase 1, and whether or not that makes a difference.
5.2.1p3 ends with
If any other characters are encountered in a source file
(except (...)), the behavior is undefined.
At least this text should say something like "after translation phase
1" or "mapped to the the source character set".
Strings:
7.1.1p1 is slightly incorrect: this is one place where "null character"
means "character or multibyte character", since "string" can mean either
character string or multibyte string.
In 7.1.1p1's sentence
The term multibyte string is sometimes used instead to emphasize
special processing given to multibyte characters contained in the
string or to avoid confusion with a wide string.
maybe "contained" should be replaced with "encoded".
And to complement it, add:
The term character string is sometimes used to emphasize that
each byte in the string contains an integer character code
(converted to char).
Some other matters:
Source characters and multibyte characters are defined in terms of
bytes, which are defined in terms of the execution environment. Thus,
the compilation environment can't have more bits per byte than the
execution environment. This looks strange; if it is intended I think it
must be emphasized.
The ending sentence "If any other characters are encountered in a source
file..." in 5.2.1p3 is placed so it actually means "other than in the
required *execution* character set".
6.4.3p2/p3 refers to the "required character set"; that should be
"required source character set".
The Index needs entries for "required source character set" and
"generalized multibyte character".
6.2.5p5-p6, D.2.1p1, K.1p1 refer to the "values" of bytes. That can be
replaced this with the "contents" of bytes, depending on what you do
with the suggested NOTE above about values of bytes.
To make clear that e.g. a yen sign may be used for backslash in C, add
to this wording of 5.2.1p3:
Both the basic source and basic execution character sets shall
have at least the following members:
the text
(though which glyphs these members correspond to on an output
device is unspecified)
------------------------------------------------------------------------
Comment 6.
Category: Change to existing feature, request for clarification
Committee Draft subsection: 5.1.1.2, 6.2.5, 6.4.4, 6.4.5
Title: Failure on source->execution character conversion
In translation phase 5, if a source character has no corresponding
member in the execution character set, it is converted to an
implementation-defined member.
1. Are _all_ source characters lacking a corresponding member in the
execution character set converted to the _same_ character?
2. It should be allowed to let any use of the resulting character cause
a compile-time error. (It should be legal to let compilation fail if
the program may be translated differently than intended, or to let
users request this.)
Other places where error must then be allowed:
6.2.5p3
If any other character [than a required source character] is
stored in a char object, the resulting value is
implementation-defined but shall be within the range of values
that can be represented in that type.
6.4.5p5
The value of a string literal containing a multibyte character or
escape sequence not represented in the execution character set is
implementation-defined.
6.4.4.4p2
An integer character constant is a sequence of one or more
multibyte characters (...) mapped in an implementation-defined
manner to members of the execution character set.
[Incidentally, this says only what the encoding is, not what the
integer value is. This should be mentioned explicitly, in particular
since 6.4.4.4p5-p6 (octal/hex character) does describe the value and
not the encoding.]
6.4.4.4p10
The value of an integer character constant (...) containing a
character or escape sequence not represented in the basic
execution character set, is implementation-defined.
6.4.4.4p11
The value of a wide character constant (...) containing a
multibyte character or escape sequence not represented in the
extended execution character set, is implementation-defined.
------------------------------------------------------------------------
Comment 7.
Category: Request for clarification, or change to existing feature
Committee Draft subsection: 6.10.3.2
Title: Consistent stringification of UCNs
6.10.3.2p2 (The # operator) says:
it is unspecified whether a \ character is inserted before
the \ character beginning a universal character name.
Please clarify: Is the implementation allowed to insert `\' in some
circumstances but not others? May it do so in a single translation
unit? A single preprocessing file?
(Also update this point in K.1 Unspecified behavior).
If yes: It would be useful if implementations are required to behave
consistently in this regard, preferably so that a Configure program can
test *once* e.g. how the compiler treats identifiers and how it treats
strings, and #define a macro accordingly which will be used when a
package is compiled. Unfortunately I'm not sure what kind of
"intelligence" one might want to give compilers here, such as to create
\\u... inside stringified string/character constants and \u... outside
them.
------------------------------------------------------------------------
Comment 8. [Withdrawn]
------------------------------------------------------------------------
Comment 9.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.10.3.5
Title: Preprocessor examples
The examples in 6.10.3.5 (macro scopes) would be better placed in a new
section 6.10.3.6, since only some examples are related to macro scopes.
A few examples with national characters and with universal character
names in non-string tokens would be useful.
------------------------------------------------------------------------
Comment 10.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4.3
Title: Describe the specified Unicode ranges
_Describe_ the named characters (or print them if you invert part of the
section so that below 000000A0 it says which UCNs are _allowed_), for
the sake of people who don't know Unicode well. (I found that 0000D800
through 0000DFFF are "surrogates", but not what that is, or whether it's
something a compiler accepting national characters in source files
should consider.)
------------------------------------------------------------------------
Comment 11.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4.4.4
Title: Add note that 'ab' is not a multibyte character
To avoid confusion in `Description' at page 63, add a note that the
<'ab'> is not one multibyte character.
------------------------------------------------------------------------
Comment 12.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 7.20.2.2
Title: Use inttypes in example
Suggestion: Replace `unsigned long int' with uint_fast32_t in the
rand() example.
(Or better, remove the example. It's a poor rand() implementation.)
------------------------------------------------------------------------
Comment 13.
Category: Request for information/clarification
Committee Draft subsection: 5.2.4.1, 6.4.3
Title: Meaning of "character short" identifier
What does the term "character short" identifier mean? It is used
in 5.2.4.1p1 and 6.4.3p2/p4. If it is an ISO-10646 term, you could
copy the definition from there.
------------------------------------------------------------------------
Comment 14.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4.6, Index
Title: digraphs are not defined.
The index has a reference "digraphs, 6.4.6", but the word "digraph"
does not occur anywhere else. It's a known word and people use it, so:
In 6.4.6p3 after "these six tokens", add "(called _digraphs_)" or
"(known as _digraphs_)", or say so in a note.
-----------------------------------------------------------------------
Comment 15.
Category: Request for clarification
Committee Draft subsection: 6.10.8
Title: Are __TIME__ and __DATE__ constant?
Can __TIME__ and __DATE__ change during the compilation of a single
translation unit?
If yes, I hope they may not change between seqence points. Otherwise
printf("%s at %s", __DATE__, __TIME__) can be inconsistent at midnight.
----------------------------------------------------------------------- -
Comment 16.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.19.4.2
Title: rename() of/to open files should be implementation-defined
On a UNIX filesystem, `rename' removes (or makes unnamed) the
_destination_ file if it existed. OTOH, a filesystem where the file's
basic handle is the filename may see `rename' as removing the _source_
file. So I believe rename() can have the same problem as remove() if
either of the files are open, and in addition I imagine a FILE* opened
to the destination file may end up pointing into the new file.
Suggested change:
Add this to 7.19.4.2, similar to the text in 7.19.4.1:
If either of the files are open, the behavior of the rename
function is implementation-defined.
------------------------------------------------------------------------
Comment 17.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.7.3
Title: Removing restrict from prototypes
6.7.3p7 says
deleting all instances of the [restrict] qualifier from a
conforming program does not change its meaning
Correction: Replace "program" with "translation unit", or add "and the
headers it includes" after "program".
(Otherwise one could remove `restrict' below but keep `restrict' in
string.h's declaration of strcpy.)
#include <string.h>
char *(*fn) (char * restrict, const char * restrict) = strcpy;
------------------------------------------------------------------------
Comment 18.
Category: Feature that should be included
Committee Draft subsection: 6.3.2.3
Title: C++-style `T** -> T const *const *' conversion
Allow `T**' to be converted to `T const *const *' as in C++.
This text is based on section conv.qual in the C++ standard:
- Text about pointers to members and multi-level mixed pointers has been
removed, as well as the definition of _qv-qualification signature_
which isn't used elsewhere.
- restrict has been added to the list of qualifiers.
- Paragraph 4 about converting functions' prototypes and return types
has been added. Possibly this paragraph should be put elsewhere.
Editorial note: x{y} below is used as a typographical notation for x
subscribted by y. E.g. water would be H{2}O.
Suggested change: Replace 6.3.2.3p1-p2 with the following text:
Qualification conversions
In the following text, each qual{i} or qual{i,j} is zero or more of
the qualifiers const, volatile, and restrict.
[#1] The value of an expression of type "T qual{1} *" can be converted
to type "T qual{2} *" if each qualifier in qual{1} is in qual{2}.
[#2] NOTE Function types are never qualified; see [8.3.5 in C++.]
[#3] A conversion can add qualifiers at levels other than the first
in multi-level pointers, subject to the following rules:(->footnote)
Two pointer types T{1} and T{2} are _similar_ if there exists a type T
and integer n > 0 such that:
T{1} is T qual{1,n} *qual{1,n-1} ... *qual{1,1} *qual{1,0}
and
T{2} is T qual{2,n} *qual{2,n-1} ... *qual{2,1} *qual{2,0}
An expression of type T{1} can be converted to type T{2} if and only if
the following conditions are satisfied:
-- the pointer types are similar.
-- for every j > 0, each qualifier in qual{1,j} is in qual{2,j}.
-- if qual{1,j} and qual{2,j} are different, then const is in every
qual{2,k} for 0 < k < j.
[Note: if a program could assign a pointer of type T** to a pointer of
type const T** (that is, if line //1 below was allowed), a program
could inadvertently modify a const object (as it is done on line //2).
For example,
int main() {
const char c = 'c';
char* pc;
const char** pcc = &pc; //1: not allowed
*pcc = &c;
*pc = 'C'; //2: modifies a const object
}
--end note]
[#4] A function pointer of type FT{1} can be converted to a function
pointer of type FT{2} if
- their return types are compatible types, or similar pointer types
where the return type of FT{1} can be converted to the return type
of FT{2},
- and their respective parameters are compatible types, or similar
pointer types where the parameter of FT{2} can be converted to the
type of the equivalent parameter to FT{1}.
__________________
(footnote) These rules ensure that const-safety is preserved by the
conversion.
------------------------------------------------------------------------
Comment 19.
Category: Clarification, Request for clarification
Committee Draft subsection: various
Title: Terms like "unspecified _result_" are used but not defined
The standard uses undefined terms like `implementation-defined value',
`unspecified result', and `undefined conversion state'. Clause 3 only
defines unspecified, implementation-defined and undefined *behavior*.
The term `indeterminate' is important, but the closest to a definition
of it is a mention in passing in 3.18 (undefined behavior).
Suggested changes:
Change
3.19p1 _unspecified behavior_
behavior where ...
and
3.11p1 _implementation-defined behavior_
unspecified behavior where ...
to
3.19p1 _unspecified_
aspect of the language where ...
and
3.11p1 _implementation-defined_
unspecified aspect where ...
However, keep `behavior' in the definition of `undefined behavior',
since that always *is* behavior.
Add a section 3.* _indeterminate_, and add `indeterminate' to the index.
Mention that an indeterminate value can be a trap representation (and
that a trap representation is considered a value, as in 6.2.6.1p5).
6.2.6.1p4 seems to contradicts the above:
"Certain object representations need not represent a value of
the object type (...) such a representation is called a trap
representation."
Replace "value" above with "valid value".
In 7.8.1p6, change `may be left undefined' to `need not be #defined'
or change `undefined' to `un#defined'.
In the following places, remove "behavior" or change it to "aspect(s)"
(except when applied to `undefined' if you wish to be that pedantic).
3.11p2
EXAMPLE An example of implementation-defined behavior
3.19p2
EXAMPLE An example of unspecified behavior
4p3
A program (...) containing unspecified behavior
4p5
output dependent on any unspecified, undefined, or
implementation-defined behavior
D.1p1
involves unspecified or undefined behavior
D.2p1
involves undefined or unspecified behavior
K.1 Unspecified behavior
K.3 Implementation-defined behavior
Index
implementation-defined behavior, 3.11, K.3
unspecified behavior, 3.19, K.1
Contents
K.1 Unspecified behavior ......................... 566
K.3 Implementation-defined behavior .............. 585
In the following places, change "undefined" to "indeterminate" or change
the text to say "undefined behavior":
Note 58 to 6.5p2
This paragraph renders undefined statement expressions such as
Note 80 to 6.5.9p6
the effect of subsequent comparisons is also undefined.
7.12.3p1
The result is undefined if an argument is not of real floating
type.
7.24.6.3.2p4, 7.24.6.3.3p4, 7.24.6.4.1p4 and 7.24.6.4.2p4:
the conversion state is undefined.
[Maybe this should be `unspecified'.]
D.1p2
expressions that are not determined to be undefined
Note 288 to D.4.3p3
If it were undefined to write twice to the flags
D.5p27 and D.5p31
and so the expression is undefined.
------------------------------------------------------------------------
Comment 20.
Category: Request for clarification
Committee Draft subsection: various
Title: Some unspecified cases are unclear
In some unspecified and implementation-defined cases, it is unclear
which options the implementation has to choose from.
Suggested changes:
In 3.19, add notes
- with reference to 4p3 and 5.1.2.3 that an unspecified aspect will
not cause failure (i.e. produce a trap representation or undefined
behavior). Even knowledgeable people missed that when I asked.
- that the implementation must make sensible (or maybe "unsurprising"
choices for unspecified aspects.
What are the possible behaviors in the following places?
6.5.7p5
The result of E1 >> E2 is E1 right-shifted E2 bit positions (...)
If E1 has a signed type and a negative value, the resulting value
is implementation-defined.
What are the choices here?
7.19.4.4p3
If it [the tmpnam function] is called more than TMP_MAX
times, the behavior is implementation-defined.
Are there any particular choices?
7.19.7.11p5
the value of its file position indicator after a successful call
to the ungetc function is unspecified
Are there any particular choices, except it must be an fpos_t which is
not a trap representation?
5.2.2p1, 5.2.2p2(\b,\t,\v)
the behavior is unspecified
Is only the output unspecified, or may this also affect program
execution? E.g. may ferror() be set?
Other problems:
6.3.2.3p5
An integer may be converted to any pointer type. The result is
implementation-defined, might not be properly aligned, and might
not point to an entity of the referenced type.
This is an unspecified aspect which may cause undefined behaviour,
which I'm told is not supposed to happen. I think it should be
something like this:
If the integer resulted from converting a pointer [to void?] to
intptr_t, uintptr_t [or a wider integer type?], and the
pointed-to object is still live, the result is a pointer to that
object (see 7.18.1.4). Otherwise the result is indeterminate,
but the implementation shall document it (->footnote XXX).
[footnote XXX] An integer converted to a pointer might not be
properly aligned, and might not point to an entity of the
referenced type.
------------------------------------------------------------------------
Comment 21.
Category: Defect/Request for clarification
Committee Draft subsection: various
Title: The standard implies that pointer = address = integer
This can be deduced as follows:
* C addresses are integers:
3.1p1 mentions "addresses that are particular multiples of a byte
address". Except to mathematicians, only integers are "multiples".
* C addresses are pointers (most of the Standard seems to think so):
6.5.3.2p3 says that "address-of" operator returns a pointer.
This operator's name is the clearest indication I can find of what the
relationship is between pointers and addresses.
6.5.2.5p10 says about "int *p; (...) p = (int [2]){*p}" that
"p is assigned the address of the first element of an array (...)".
* The implementation's hardware address concept is the same as C's
address concept:
This is implied by omission, in particular in 3.1p1. Readers should
not be expected to know that the hardware address concept can be very
different from the common integer-like address concept, so they may
not imagine any reason to not trust hardware addresses to be like C
addresses.
This must be corrected. C's `address' and `pointer' concepts need
explicit definitions and Index entries. The relationship between them,
and between C addresses and hardware addresses, must be stated.
3.4p2 says "It is possible to express the address of each individual
byte of an object uniquely." Maybe this is the right wording: Pointers
_express_ addresses? Or maybe this wording too is wrong - I prefer to
think of pointers as handles to objects and nothing else, and that
anything about addresses just say something about their implementation.
------------------------------------------------------------------------
Comment 22.
Category: Clarification/Correction restoring original intent
Committee Draft subsection: 5.1.1.2
Title: Pragmas are lost after translation phase 4
Pragmas are executed and then deleted in phase 4, so phases 4-6 haven't
really got an approved way to pass on pragmas that take effect in later
phases. The same applies to e.g. letting phase 1 notice and pass on the
character set name of the source file.
Suggested change:
Add this paragraph to 5.1.1.2:
The output from a translation phase may contain extra information
which is transparent to the rules [or "to the grammar"?]
described in this standard.
------------------------------------------------------------------------
Comment 23.
Category: Request for clarification/correction
Committee Draft subsection: 7.19.7.3, 7.19.7.11
Title: "character" arguments to fputc and ungetc
The range of valid arguments to fputc() and ungetc() is unclear. They
are only defined for _character_ arguments, so non-character arguments
produce undefined behaviour (7.19.7.3p2, 7.19.7.11p2/p4). However, a
character is a typeless bit representation according to 3.5. The most
natural interpretation of 3.5 is that the argument must be an integer
which fits in an unsigned char.
The behavior of fputc() should be defined at least for
- all `unsigned char' values, so one can fputc the result of fgetc.
- all `char' values, to avoid breaking a _lot_ of programs that do e.g.
putchar(*string) instead of putchar((unsigned char)*string).
fputc should convert char arguments to unsigned char as it does now.
I do not know if it should also be defined for all signed char values,
or all `int' values.
This does not work for ungetc, since a char value equal to EOF has a
special meaning. Yet ungetc too converts its argument to unsigned char,
so apparently ungetc(ch) where ch is a char is intended to work.
Please state the valid range of the character argument to ungetc. Add a
note that even though ungetc converts its argument to unsigned char, the
application should still convert any char arguments to unsigned char "by
hand", in case the char value == EOF.
------------------------------------------------------------------------
Comment 24.
Category: Request for clarification/Correction
Committee Draft subsection: 7.19.*, 5.2.4.2.1
Title: Problem with non-two's complement char
If INT_MAX - INT_MIN < UCHAR_MAX (e.g., one's complement with
sizeof(int) == 1), there is one byte value (UCHAR_MAX?) which cannot be
read and written to the stream:
- The function call `fputc(u_ch, f)' converts u_ch to int, and then fputc
converts it back to unsigned char and writes that to the stream. This
will typically convert the bit pattern for negative zero t0 zero, I'm
not sure if it can do it with some other value.
- Similarly, fgetc(f) converts the unsigned char on the stream to int and
returns that, and I'm not even sure it is at liberty to return
"negative zero" so the application could do some hack to notice it.
- 7.19.2p3 may be taken to forbid this (Data read in from a binary
stream shall compare equal to the data that were earlier written out
to that stream), if it is read as 'fwrite() the data, fread() it back,
then memcmp() shall return 0'. However, the 'data read' and 'data
written' can also be taken to mean the output from fgetc() and the
input to fputc(), which have already been through the conversion to
int.
The intent here must be clarified.
If the intent is that INT_MAX - INT_MIN >= UCHAR_MAX, 5.2.4.2.1 would
be a better place to state it. It seems strange to have to search the
library section for such basic restrictions on the language.
____ end of Norway Comments; beginning of United Kingdom comments ____
The UK votes NO on CD2/FCD1.
To change the UK vote to YES:
The following major issues must be addressed:
side effects in VLAs
write-once-read-many changes to restrict
various issues with floating point
For each UK comment listed in column 1 of the table below, the described
change or an equivalent must be made.
For each UK comment listed in column 2 of the table below, the described
change or an equivalent must be made or the
UK National Body must be satisfied that there is good reason to not make
the change.
The UK comments listed in column 3 of the table below must be properly
considered and a reasonable response made.
The changes in these comments are not mandatory provided that such
reponses are made.
The procedural matters described below must be addressed.
Column 1
Changes that must be applied
PC-UK0201 [1]
PC-UK0244
PC-UK0246
PC-UK0248
PC-UK0272
PC-UK0273
PC-UK0277
PC-UK0278
PC-UK0286
PC-UK0287
Column 2
Comments that must be addressed
PC-UK0209 [2]
PC-UK0214 [3]
PC-UK0222 [4]
PC-UK0232
PC-UK0245
PC-UK0254
PC-UK0274
PC-UK0275
PC-UK0279
PC-UK0281
PC-UK0282
PC-UK0283
PC-UK0284
PC-UK0285
PC-UK0261
PC-UK0262
PC-UK0269
PC-UK0270
Column 3
Other comments
PC-UK0227 [5]
PC-UK0249
PC-UK0251
PC-UK0252
PC-UK0256
PC-UK0257
PC-UK0276
PC-UK0263
PC-UK0264
PC-UK0265
Notes
1 We do not believe that WG14 addressed the issue actually raised.
2 The UK does not consider this to be a a new feature, but a minor
though useful enhancement to an existing one.
3 Previous objection to this item was based on problems with realloc.
Now that the latter has been redefined to create a new object, thi
item should be reconsidered.
4 The UK considers that code affected by this item is almost certain
to be erroneous, and feels that it is important that it is
addressed.
5 This item would clarify the meaning of bit-fields, and in
particular that they cannot be wider than specified.
This response also assumes that the following items of SC22/WG14/N847
have been accepted as is or with editorial changes:
4, 8, 10, 19, 20, 21, 33, 36, 43. Otherwise these items should be added
to column 1 of the above table.
Procedural issues
WG14 failed to provide comprehensible responses to a number of matters
raised in the UK comments to CD1. To the extent that those comments are
not subsumed by other parts of this response, they are required to be
addressed in a way that allows the UK to determine whether the WG14
responses are acceptable, and therefore form a part of this submission.
The UK comments in question are:
PC-UK0079 PC-UK0082 PC-UK0083 PC-UK0085 PC-UK0086 PC-UK0088 PC-UK0089
PC-UK0090 PC-UK0091 PC-UK0092 PC-UK0093 PC-UK0094 PC-UK0095 PC-UK0096
PC-UK0097 PC-UK0098 PC-UK0102 PC-UK0106 PC-UK0108 PC-UK0112 PC-UK0114
PC-UK0117 PC-UK0118 PC-UK0120 PC-UK0122 PC-UK0126 PC-UK0129 PC-UK0130
PC-UK0133 PC-UK0134 PC-UK0135 PC-UK0137 PC-UK0138 PC-UK0141 PC-UK0142
PC-UK0143 PC-UK0144 PC-UK0147 PC-UK0150 PC-UK0151 PC-UK0152 PC-UK0153
PC-UK0154 PC-UK0155 PC-UK0156 PC-UK0158 PC-UK0159 PC-UK0161 PC-UK0162
PC-UK0163 PC-UK0164 PC-UK0165 PC-UK0171
Side effects in VLAs
The UK requires the issue of side effects in variably-modified type
declarations and type names to be addressed. A number of proposals have
previously been produced to this end, such as those in PC-UK0226 and
PC-UK0250.
The minimum requirement is that, for any given piece of code, the code
either violates a constraint, or else all implementations produce the same
result (in the absence of any unspecified behaviour in the code not
related to the use of variably-modified types). In particular, it is not
acceptable for side effects to be optional.
The UK preference is to have side effects work normally in
variably-modified types. It would be acceptable for a constraint to
forbid certain operators (such as ++ and function call) within array
sizes within a sizeof expression.
Changes to restrict
There are four basic issues to be addressed:
1.The current specification of restrict disallows aliasing of unmodified
objects, which renders some natural and useful programs undefined
without promoting optimization. This is contrary to the prior art in
Fortran.
2.If a restricted pointer points to an object with const-qualified type,
the current specification allows casting away the const qualifier, and
modifying the object. Disallowing such modifications promotes
optimization as illustrated in example F below.
3.The current specification does not address the effect of accessing
objects through pointers of various types, all based on a restricted
pointer. In particular, these objects are supposed to determine an array,
but the element type is not specified.
4.The specification of realloc now states that the old object is freed,
and a new object is allocated. The old and new objects cannot, in
general, be viewed as being members of an array of such objects. With the
current specification, this appears to prohibit the use of the restrict
qualifier for a pointer that points to an object that is realloc'd. There
are also related issues for dynamically allocated linked lists.
The following changes would address these, though it is accepted that
further discussion may be able to improve them. In these changes, new text
is in bold and removed text in italics.
In 6.7.3.1, amend paragraph 1 to read:
Let D be a declaration of an ordinary identifier that provides a
means of designating an object P as a restrict-qualified pointer to
objects of type T.
Change paragraph 4 to:
During each execution of B, let A be the array object that is determined
dynamically by all references through pointer expressions based on P.
Then all references to values of A shall be through pointer expressions
based on P. Let L(X,B) denote the set of all lvalues that are used to
access object X during a particular execution of B. If T is
const-qualified, and the address of one lvalue in L(X,B) is based on P,
then X shall not be modified during the execution of B. If T is not
const-qualified, the address of one lvalue in L(X,B) is based on P, and X
is modified during the execution of B, then the addresses of all lvalues
in L(X,B) shall be based on P. The requirement in the previous sentence
applies recursively, with P in place of X, with each access of X through
an lvalue in L(X,B) treated as if it modified the value of P, and with
other restricted pointers, associated with B or with other blocks, in
place of P. Furthermore, if P is assigned the value of a pointer
expression E that is based on another restricted pointer object P2,
associated with block B2, then either the execution of B2 shall begin
before the execution of B, or the execution of B2 shall end prior to the
assignment. If these requirements are not met, then the behavior is
undefined.
Alternative wording for the last new sentence ("The requirement ...") is:
If X is modified, the requirement in the previous sentence applies
recursively: P is treated as if it were itself modified and replaces
X in L(X,B), then the same condition shall apply to other restricted
pointers, associated with B or with other blocks, in place of P in
the previous sentence.
Fina ly, WG14 may wish to consider the following additional change
(rationale is available separately). In 6.7.5.3 paragraph 6, change:
A declaration of a parameter as "array of type" shall be adjusted to
"pointer to type",
to:
A declaration of a parameter as "array of type" shall be adjusted to
"restrict-qualified pointer to type",
and add a new paragraph after paragraph 6:
As far as the constraints of restrict-qualification are concerned
(6.7.3.1), a parameter that is a complete array type shall be
regarded as a pointer to an object of the complete array size; for
all other purposes, its type shall be as described above.
Issues with floating point
Floating-point Unimplementabilities and Ambiguities
The UK comments on CD1 included a large number of comments on CD1 that
have not been addressed in the FCD. Discusssions on the reflector indicate
that many of the new features in the language are intended to make sense
only if Annex F or Annex H are in force. This is not reasonable, not least
because it makes the main body of the standard meaningful only in the
context of an informative annex.
It is not reasonable to claim that such problems do not matter because
they cannot be shown in strictly conforming programs.
The same applies to the new features in their entirety, because they are
defined only in certain implementation defined cases.
And the same applies to almost all error and exception handling, even in
C89.
The list of architectures which will have major trouble with the new
proposals includes the IBM 360/370/390 (including the Hitachi S-3600 and
others), the NEC SX-4, the DEC VAX, the Cray C90 and T90, the Hitachi
SR2201, the DEC Alpha (to a certain extent) and almost certainly many
others. Implementors on these will interpret the standard in many, varied
and incompatible ways, because they CANNOT implement the current wording
in any way that makes sense.
For similar reasons, these new features are impossible to use in a
portable program, because it is not possible to determine what they mean,
unless __STD_IEC_559__ is set. This is not reasonable for features
defined in the main body of the standard.
The standard must be improved so that all such arithmetic-dependent
features are shielded in some suitable way: by a predefined preprocessor
macro, moved to an optional annex, defined so that all reasonable
implementations can support them, or defined to permit an implementation
to indicate that they are not supported. It does not really matter which
approach is adopted.
The following suggestions should remove the worst of the problems, mostly
using the last approach. In most cases, they are trivial extensions that
merely permit the implementor to return an error indication if the
feature cannot be provided, or wording to change ill-defined behaviour
into implementation-defined behaviour.
7.6 Floating-point environment <fenv.h>
Item A
The C standard should not place constraints on a program that are not
determinable by the programmer, and the second and third items of
paragraph 2 do. Many systems use floating-point for some integer
operations or handle some integer exceptions as floating-point - e.g.
dividing by zero may raise SIGFPE, and integer multiplication or division
may actually be done by converting to floating-point and back again.
Either the clause "or unless the function is known not to use floating
point" should be dropped in both cases, or a determinable condition should
be imposed, such as by adding the extra item:
- any function call defined in the headers <complex.h or <math.h> or
defined elsewhere to access a floating-point object is assumed
to have the potentia l for raising floating-point exceptions,
unless the documentation states otherwise.
This requires most of the functions in <stdlib.h> to handle exceptions
themselves, if they use floating-point, but that is assumed by much
existing code. It has the merit of at least being determinable, which the
existing wording isn't.
Item B
There is another serious problem, even on systems with IEEE arithmetic,
in that the interaction of the flag settings with setjmp/longjmp is not
well-defined. Should they have the value when setjmp was invoked, when
longjmp was called, or what?
Worse still, the current wording does not forbid setjmp to be invoked
with non-standard modes and longjmp called with default ones, which won't
work in general.
Another item of paragraph 2 should be added:
- if the setjmp macro is invoked with non-default modes, the
behaviour is undefined. The values of the exception flags on return
from a setjmp macro invocation are unspecified.
Item C
A related one concerns the case where a function with FENV_ACCESS set on
calls one with FENV_ACCESS set off - the current wording implies that the
latter must set the flags precisely for the benefit of the former, which
is a major restriction on implementors and makes a complete mockery of
footnote 163.
The second last sentence of paragraph 2 should be changed to:
If part of a program sets or tests flags or runs under non-default
mode settings, ....
7.6.2 Exceptions
These specifications do not allow the implementation any way to indicate
failure. This is (just) adequate for strict IEEE arithmetic, but is a
hosta ge to fortune and prevents their use for several IEEE-like
arithmetics. All such implementations can do is to not define the macros,
thus implying that they cannot support the functions, whereas they may be
able to support all reasonable use of the functions and merely fail in
some perverse circumstances.
All of these functions (excluding fetestexcept) should be defined with a
return type of int, and to return zero if and only if the call succeeded.
7.6.3.1 The fegetround function
What happens if the rounding mode is none of the ones defined above, or
is not determinable (as can occur)? The following should be added to the
end of paragraph 3:
If the rounding mode does not match a rounding direction macro or is
not determinable, a negative value is returned.
7.6.3.2 The fesetround function
Many existing C <math.h> implementations and even more numerical
libraries have the property that they rely on the rounding mode they are
called with being the one they were built for. To use a different
rounding mode, the user must link with a separate library. The standard
should permit an implementation to reject a change if the change is
impossible, as distinct from when it does not support that mode at all.
Paragraph 3 be simplified to:
The fsetround function returns a nonzero value if and only if the
requested rounding direction has been established.
Note that this enables the example to make sense, which it doesn't at
present.
7.6.2 Environment
Exactly the same points apply as for 7.6.2 Exceptions above for all the
functions (excluding feholdexcept), and exactly the same solution should
be adopted.
7.12 Mathematics <math.h>
Item A
A major flaw in paragraphs 4 and 5 is that there is no way of specifying
an infinity or a NaN for double or long double, unless float supports
them. While this is the common case, C9X does not require it and it is
not reasonable to do so. In particular, the NEC SX-4 supports IEEE, Cray
and IBM arithmetics, and there are a lot of IEEE systems around which
have non-IEEE long double, and this case cannot be fully supported,
either.
'float' should be changed to to 'double' in paragraph 4 and the following
should be added to it:
The macros
INFINITY_F
INFINITY_L
are respectively float and long double analogs of INFINITY.
'float' should be changed to to 'double' in paragraph 5 and the following
should be added to it:
The macros
NAN_F
NAN_L
are respectively float and long double analogs of NAN.
Item B
The classification macros are inadequate to classify all numbers on many
architectures - for example, the IBM 370 has unnormalised numbers and the
DEC VAX has invalid ones (i.e. not NaNs.) 5.2.4.2.2 and 7.6 permit other
implementation-defined values, but this section does not.
The following should be appended to paragraph 6:
Additional floating-point classifications, with macro definitions
beginning with FP_ and an uppercase letter, may also be specified by
the implementation.
7.12.1 Treatment of error conditions
I have no idea what "without extraordinary roundoff error" means, and I
have been involved in the implementation and validation of mathematical
functions over 3 decades. My dictionary contains 5 definitions of
"extraordinary", most of which might be applicable, and I know at least 3
separate meanings of the term "roundoff error" in the context of
mathematical functions.
The following paragraph should be added:
If a function produces a range error to avoid extraordinary roundoff
error, the implementation shall define the conditions when this may
occur.
7.12.3.1 The fpclassify macro
As mentioned above, the current wording forbids an implementation from
correctly classifying certain IEEE, IBM 370 and DEC VAX numbers. The first
sentence of paragraph 2 should have appended:
..., or into another implementation-defined category.
7.12.3.2 The signbit macro
The wording of this is seriously flawed. It says that it returns the sign
of a number, but is clearly intended to test the sign bit, and these are
NOT equivalent. IEE 754 states explicitly that it does not interpret the
sign of NaNs (see section 6.3), and the VAX distinguishes zeroes from
reserved operands (not NaNs, but something much more illegal) by the sign
bit.
And there is nowhere else in C9X that requires the sign of a
floating-point number to be held as a bit - surely people have not
yet forgotten ones' and twos' complement floating point?
Paragraphs 1 and 2 should be replaced by:
For valid non-zero values (including infinities but not NaNs), the
signbit macro returns nonzero if and only if the sign of its argument
is negative.
For zeroes and NaNs when __STD_IEC_559__ is defined, the signbit
macro returns nonzero if and only the sign bit of the value is set.
For zeroes and NaNs when __STD_IEC_559__ is not defined, the signbit
macro returns nonzero for an implementation-defined set of values
and zero otherwise.
7.12.11.1 The copysign functions
What does "treat negative zero consistently" mean? Does IBM 370 arithmetic
do it? Does VAX? Does Intel? Does Cray? Does IEEE?
The sentence "On implementations ... the sign of zero as positive."
should be replaced by one or the other of the following:
Unless __STD_IEC_559__ is defined (see Annex F), it is
implementation-defined whether any representations of zero are
regarded as negative by this function and, if so, which.
or:
The copysign functions shall interpret the sign of zeroes in the
same way that the signbit macro (7.12.3.2) does.
Floating-point Incompatibilities with Other Standards
The UK comments on CD1 included a large number of comments on CD1 that
have not been addressed in the FCD with regard to compatibility with IEC
60559 (IEEE 754) and ISO/IEC 10967-1 (LIA-1). It is not reasonable to
claim that such problems do not matter because they cannot be shown in
strictly conforming programs. The same applies to almost all of the
trickier aspects of C89 and C9X floating-point support.
The responses stated that the intention is compatibility only with a
subset of those standards, but those standards do not always allow the
subsetting requires by C9X. Furthermore, the statement is not true in all
cases, and it is impossible for an implementation to conform to both
standards simultaneously.
The standard must be improved so that an implementation can reasonably
satisfy both standards simultaneously, in all aspects where C9X claims
that it is compatible with the other standards. Where this is not
possible, C9X should admit the fact in so many words or provide a
mechanism for alternate implementation.
There is also the major point that C9X can and should specify syntax for
such support, in case where this would avoid implementations providing it
incompatibly. This will then reduce problems if C wishes to support the
feature properly at a later revision. A precedence for this is the signal
handling, which effectively defines syntax but leaves the semantics
almost completely undefined.
Furthermore, there are many places where C9X makes it unnecessarily
difficult to satisfy the other standards, and where minor changes would
have major benefits. These should be improved, and the forthcoming
ISO/IEC 10967-2 (LIA-2) should also be considered in this respect.
The following suggestions should remove the worst of the problems, but
are by no means a complete solution.
5.2.4.2.2 Characteristics of floating types <float.h>
Paragraph 5 doesn't define precisely what the rounding mode terms mean,
and there are many possible interpretations (especially of nearest
rounding for numbers that fall precisely between two values.) Note that
this is specified by IEC 60559 but explicitly not by ISO/IEC 10967-1.
However, the latter requires the rounding method to be documented in full
(see section 8, paragraph f.)
The following should be added to the end of the last sentence:
Unless __STD_IEC_559__ is defined (see Annex F), the precise
definition of these rounding modes is implementation-defined.
7.3.2 Conventions
This comment is not strictly an incompatibility, but is about wording
likely to cause such problems. The current description of errno handling
is so confusing that it could be interpreted that errno is unpredictable
on return from a complex function. This cannot be the intention. The
second sentence should be replaced by:
An implementation may define domain and range errors, when it will
set errno to EDOM and ERANGE and the result to an implementation
defined value, but is not required to.
7.6 Floating-point environment <fenv.h>
7.12.1 Treatment of error conditions
Annex F IEC 60559 floating-point arithmetic
Annex H Language Independent Arithmetic
One of the assumptions in the IEC 60559 model is that exception flags
will eventually be either cleared or diagnosed, and this is required by
ISO/IEC 10967-1. Fortran does not specify what may be written to
'standard error', but C does, and many vendors regard the standard as
forbidding them from issuing diagnostics in this case. H.3.1.1 states
that C permits an implementation to do this, but provides no hint as to
how. Furthermore, there is no implication in the standard that
floating-point exception flags must have any particularvalues at any time.
The following should be added to 7.6:
If any of the exception flags are set on normal termination after
all calls to functions registered by the atexit function have
been made (see 7.20.4.3), and stderr is still open, the
implementation may write some diagnostics indicating the fact to
stderr.
If this is not done, then Annex H must be corrected, or clarified to
explain how such a diagnostic can be produced by a conforming
implementation.
7.12 Mathematics <math.h>
F.2.1 Infinities, signed zeroes and NaNs
F.3 Operators and functions
Section 7.12 paragraphs 5 and 6 and F.3 are seriously incompatible with
the spirit of IEC 60559, and are in breach of IEEE 754 section 6.2, by not
providing any way to define a signalling NaN or test for whether a NaN is
signalling or quiet. In particular, an implementation cannot extend the
fpclassify function to conform to both standards, because C9X requires it
to classify both signalling and quiet NaNs as FP_NAN, and IEC 60559
requires it to distinguish them.
Furthermore, the current C9X situation does not allow a programmer to
initialise his data to signalling NaNs (as recommended by IEEE 754). It is
perfectly reasonable not to define the behaviour of signalling NaNs in
general, but it is not reasonable to be unnecessarily hostile to IEC
60559. At the very least, there should be a macro NANSIG for creating
one, which can be used in initialisers, and a macro FP_NANSIG for flagging
one.
There are also implementation difficulties with the wording of fpclassify
as it stands, especially since it may need to copy its argument, and this
is not always possible for signalling NaNs.
7.12 should have the extra paragraph:
The macro
NANSIG
is defined if and only if the implementation supports signalling
NaNs for the double type. It expands to a constant expression of type
double representing an implementation-defined signalling NaN. If
defined, NANSIG may be used as an initializer (6.7.8) for an object
of semantic type double; no other semantics for NANSIG values are
defined by this standard.
The macros
NANSIG_F
NANSIG_L
are respectively float and long double analogs of NANSIG.
Note that it is not possible to have solely a float value of NANSIG,
because of the constraints on copying signalling NaN values.
7.12 paragraph 6 should define the extra symbol:
FP_NANSIG
and add the extra sentence:
This standard does not specify whether the argument of fpclassify is
copied or not, in the sense used by IEC 60559.
F.2.1 paragraph 1 needs replacing by::
The NAN, NAN_F, NAN_L, NANSIG, NANSIG_F, NANSIG_L, INFINITY,
INFINITY_F an INFINITY_L macros in <math.h> provide designations for
IEC 60559 NaNs and infinities.
F.3 last paragraph (starting "The signbit macro") should be simplified by
the omission of the exclusion in brackets - i.e. "(except that fpclassify
does not distinguish signalling from quiet NaNs)".
7.12.1 Treatment of error conditions
Paragraph 2 is in clear conflict with the stated intention of IEC 60559
and ISO/IEC 10967-1, and actually prevents an implementation from
conforming to both C9X and the whole of ISO/IEC 10967-1 simultaneously.
Despite this, H.3.1.2 Paragraph 1 claims that the C standard allows "hard
to ignore" trapping and diagnostics as an alternative form of
notification (as required by ISO/IEC 10967-1), but it specifically FORBIDS
this in many routines of the library that provide the ISO/IEC 10967-1
functionality (as described in H.2.3.2).
This is unacceptable. While there are many possible solutions, this
problem is extremely pervasive, and most of them would involve extensive
changes to C9X. However, SOMETHING needs to be done, and the following
are possibilities:
1.To remove the erroneous claims of ISO/IEC 10967-1 support from Annex
H.
2 To define a pragma to select between the mode where errno is
returned and modes where ISO/IEC 10967-1 "hard to ignore" trapping
and diagnostics are used. Unfortunately, the changes would be
extensive.
3.To add the following to 7.12.1:
An implementation shall provide a mechanism for programs to be
executed as described above. It may also provide a mechanism by
which programs are executed in a mode in which some or all
domain and range errors raise signals in an
implementation-defined fashion.
Recommended practice
If domain errors raise a signal, the signal should be SIGILL.
If range errors raise a signal, the signal should be SIGFPE. It
should be possible for the program to run in a mode where
domain errors and range errors that correspond to overflow raise
signals, but range errors that correspond to underflow do not.
Alternatively, people might prefer to use SIGFPE for both classes of
error; there are arguments both ways, and either choice is
reasonable.
F.9 Mathematics <math.h>
Paragraph 4 is seriously incompatible with the spirit of IEC 60559 and
the wording of ISO/IEC 10967-1. Note that 7.12.1 paragraphs 2 and 3
permits an implementation to define additional domain and range error
conditions, but this section does not.
Paragraph 4 should be changed to:
The invalid exception will be raised whenever errno is set to EDOM.
Subsequent subclauses of this annex specify other circumstances when
the invalid or divide-by-zero exceptions are raised.
There is also a possible ambiguity in paragraphs 5 and 6, and a problem
caused by cases where the implementation may define extra range errors as
permitted by 7.12.1. It should be closed by adding the following:
Whenever errno is set to ERANGE, at least one of the divide-by-zero,
overflow or underflow exceptions shall be raised.
H.3.1 Notification alternatives
H.3.1.2 Traps
ISO/IEC 10967-1 6.1.1 point (c) requires the ability to permit the
programmer to specify code to compensate for exceptions if
trap-and-resume exception handling is used. C does not permit such code,
but H.3.1 paragraph 4 claims that it does. In particular, there is no way
t o return a corrected value after a numeric (SIGFPE) signal. Paragraphs 4
of H.3.1 and H.3.1.2 must be corrected, so that they do not claim that C9X
supports ISO/IEC 10967-1 trap-and-resume exception handling.
H.3.1.2 paragraph 4 claims that the C standard allows trap-and-terminate
as well as trap-and-resume. This is not true, either, as C9X stands. In
particular, it does not permit it with exponentF and scaleF implemented
using logb and scalbn etc. Either such termination must be permitted, or
paragraphs 4 of H.3.1 and H.3.1.2 must be corrected; a suggestion is made
for the former elsewhere in this proposal..
Details of PC-UK02xx issues
Category: 1
PC-UK0201
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 4
Title: Further requirements on the conformance documentation
Detailed description:
The Standard requires an implementation to be accompanied by
documentation of various items. However, there is a subtle difference
between the terms "implementation-defined" and "described by the
implementation" which has been missed by this wording (this is partly due
to the tightening up of the uses of this term between C89 and C9X - see
for example subclause 6.10.6).
As a result, the wording does not actually require the latter items to be
documented.
Change the paragraph to:
An implementation shall be accompanied by a document that describes
all features that this International Standard requires to be described
by the implementation, including all implementation-defined
characteristics and all extensions.
PC-UK0244
Category: Inconsistency
Committee Draft subsection: 6.2.5, 6.5.3.4, 6.7
Title: Issues with prototypes and completeness.
Detailed description:
6.2.5p23 says "An array type of unknown size is an incomplete type". Is
the type "int [*]" (which can only occur within a prototype) complete or
incomplete ?
If it is complete, then what is its size ? This can occur in the
construct
int f (int a [sizeof (int [*][*])]);
It it is incomplete, then the type "int [*][*]" is not permitted, which
is clearly wrong.
Now consider the prototype:
int g (int a []);
The parameter clearly has an incomplete type, but since a parameter is an
object (see 3.16) this is forbidden by 6.7p7:
If an identifier for an object is declared with no linkage, the type
for the object shall be complete by the end of its declarator, or by
the end of its init-declarator if it has an initializer.
This is also clearly not what was intended.
One way to fix the first item would be to change 6.5.3.4p1 to read:
[#1] The sizeof operator shall not be applied to an
expression that has function type or an incomplete type, to
|| an array type with unspecified size 72a), to
the parenthesized name of such a type, or to an lvalue that
designates a bit-field object.
|| 72a) An array type with unspecified size occurs in function
|| prototypes when the notation [*] is used, as in "int [*][5][*]".
One way to fix the second item would be to change 6.7p7 to read:
If an identifier for an object is declared with no linkage, the type
for the object shall be complete by the end of its declarator, or by
the end of its init-declarator if it has an initializer;
|| in the case of function arguments (including in prototypes) this shall
|| be after making the adjustments of 6.7.5.3 (from array and function
|| types to pointer types).
PC-UK0246
Committee Draft subsection: 6.2.5, 6.7.2.2
Title: Circular definition of enumerated types
Detailed description:
6.7.2.2 para 4 says:
Each enumerated type shall be compatible with an integer type.
However, 6.2.5 para 17 says:
The type char, the signed and unsigned integer types,
and the enumerated types a re collectively called integer types.
Thus we have a circular definition. To fix this, change the former to one
of:
Each enumerated type shall be compatible with a signed or unsigned
integer type.
or:
Each enumerated type shall be compatible with a standard integer type
or an extended integer type.
PC-UK0248
Committee Draft subsection: 6.3.2.3
Title: Null pointer constants should be castable to pointer types
Detailed description:
6.3.2.3p3 says that:
If a null pointer constant is assigned
to or compared for equality to a pointer, the constant is
converted to a pointer of that type. Such a pointer, called
a null pointer,
However, this doesn't cover cases such as:
(char *) 0
which is neither an assignment or a comparison. Therefore this is not a
null pointer constant, but rather an implementation-defined conversion from
an integer to a pointer. This is clearly an oversight and should be fixed.
Either change:
If a null pointer constant is assigned
to or compared for equality to a pointer, the constant is
converted to a pointer of that type. Such a pointer, called
a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
to:
If a null pointer constant is assigned
to or compared for equality to a pointer, the constant is
converted to a pointer of that type. When a null pointer
constant is converted to a pointer, the result (called a /null
pointer/) is guaranteed to compare unequal to a pointer to any
object or function.
or change:
If a null pointer constant is assigned
to or compared for equality to a pointer, the constant is
converted to a pointer of that type. Such a pointer, called
a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
[#4] Conversion of a null pointer to another pointer type
yields a null pointer of that type. Any two null pointers
shall compare equal.
to:
If a null pointer constant is assigned
to or compared for equality to a pointer, the constant is
converted to a pointer of that type.
A /null pointer/ is a special value of any given pointer type
that is guaranteed to compare unequal to a pointer to any
object or function. Conversion of a null pointer constant to a
pointer type, or of a null pointer to another pointer type,
yields a null pointer of that type. Any two null pointers
shall compare equal.
PC-UK0272
Committee Draft subsection: 6.5.9
Title: Tidy up of pointer comparison
Detailed description:
Clause 6.5.8, para 5-6: The original wording (6.3.8) contained
a paragraph between these two. The first sentence of this paragraph
has been moved to paragraph 5. The second sentence ("If two pointers
to object or incomplete types compare equal, both point to the same
object, or both point one past the last element of the same array
object."). does not appear to occur in the FCD. The sentence needs
to be in the FCD so that all cases are covered.
Append to 6.5.9 para 6:
Otherwise the pointers compare unequal.
PC-UK0273
Committee Draft subsection: 6.7.5.3
Title: Forbid incomplete types in prototypes
Detailed description:
Clause 6.7.5.3, para 8: This allows constructs such as:
void f1(void, char);
struct t;
void f2(struct t);
Allowing incomplete types in prototypes is only necessary to support
[*] (another UK proposal would make this a complete type). If
certain incomplete types are allowed in prototypes they need to
be explicitly called out. Otherwise the behaviour should be
made a constraint violation.
Remove the words "may have incomplete type and" from the cited paragraph.
Words should be added somewhere to make it clear that [*] arrays are
complete types; for example, in 6.7.5.2 para 3 change:
... in declarations with function prototype scope.111) ...
to
... in declarations with function prototype scope 111); such
arrays are nonetheless complete types ...
See also PCUK-0244.
PC-UK0277
Committee Draft subsection: 6.2.6.1, 6.5.2.3
Title: Effects on other members of assigning to a union member
Detailed description:
6.5.2.3p5 has wording concerning the storing of values into a union
member:
With one exception, if the value of a member of a union object is
used when the most recent store to the object was to a different
member, the behavior is implementation-defined.
When this wording was written, "implementation-defined" was interpreted
more loosely and there was no other relevant wording concerning the
representation of values. Neither of these is the case anymore.
The requirement to be implementation-defined means that an implementation
must ensure that all stored values are valid in the types of all the other
members, and eliminates the possibility of them being trap representations.
It also makes it practically impossible to have trap representations at all.
This is not the intention of other parts of the Standard.
It turns out that the wording of 6.2.6.1 is sufficient to explain the
behavior in these circumstances. Type punning by unions causes the same
sequence of bytes to be interpreted according to different types. In
general, 6.2.6.1 makes it clear that bit patterns can be trap values, and
so the programmer can never be sure that the value is safe to use in a
different type.
One special case that should be considered is the following:
union {
unsigned short us;
signed int si;
unsigned int ui;
} x;
If x.si is set to a positive value, the requirements of 6.2.5 and 6.2.6.1
will mean that x.ui holds the same value with the same representation.
This appears to be a reasonable assumption. A similar thing happens if
x.ui is set to a value between 0 and INT_MAX. If x.si is set to a negative
value, or x.ui to a value greater than INT_MAX, the other might be set to
a trap representation if there are any padding bits; if there are none,
then it must be the case that the extra bit in x.ui overlaps the sign bit
of x.si. Finally, if x.ui is set to some value and x.us does not have any
padding bits and does not overlap any padding bits of x.ui, then x.us will
have some value determinable from the arrangement of bits in the two types
(this might be the low bits of x.ui, the high bits, or some other
combination).
None of these cases should be particularly surprising.
The cited wording in 6.5.2.3 merely muddles the issue by implying that
all possible members take sensible (non-trap) values. It should be removed;
the rest of the paragraph can stand alone.
Committee Draft subsection: 7.19.8.1, 7.19.8.2
PC-UK0278
Title: Clarify the actions of fread and fwrite
Detailed description:
The exact behaviour of fread and fwrite are not well specified, particularly
on text streams but in actuality even on binary streams. For example, the
wording does not require the bytes of the object to be output in the order
they appear in memory, but would allow (for example) byte-swapping. It is
reported that at least one implementation converts to a text form such as
uuencoding on output to a text stream, converting back on input. And,
finally, there is not even a requirement that data written out by fwrite
and read back by an equivalent call to fread even has the same value.
These changes apply the obvious semantics.
In 7.19.8.1p2, add after the first sentence:
For each object, /size/ calls are made to the /fgetc/ function and
the results stored, in the order read, in an array of /unsigned char/
exactly overlaying the object.
In 7.19.8.2p2, add after the first sentence:
For each object, /size/ calls are made to the /fputc/ function, taking
the values (in order) from an array of /unsigned char/ exactly
overlaying the object.
PC-UK0286
Committee Draft subsection: 7.6, 7.6.3
Title: Inconsistencies in fesetround
Detailed description:
It is not clear from the text whether an implementation is required to
allow the floating-point rounding direction to be changed at runtime. The
wording of 7.6p7 implies that those modes defined must be settable at
runtime ("... supports getting and setting ..."), but if this is the case
then 7.6.3.2p4 (example 1) would have no need to call assert on the results
of the fesetround call, since that call could not fail (if FE_UPWARD is not
defined the code is in error). On the other hand, 7.6.3.2p2 implies that
setting succeeds whenever the argument is one of the defined FE_ macros,
and in any case 7.6.3.2p3 is ambiguous.
Even if the mode cannot be set at runtime, it may be the case that the
code is compiled under more than one mode, and it is therefore convenient
to be able to retrieve the current mode.
Option A
--------
If the intention is that there may be rounding modes that can be in effect
but cannot be set by the program, then make the following changes:
Change 7.6p7 to:
Each of the macros
FE_DOWNWARD
FE_TONEAREST
FE_TOWARDZERO
FE_UPWARD
is defined if the implementation supports this rounding direction
and it might be returned by the fegetround function; it might be
possible to set this direction with the fesetround function.
Additional rounding directions ... [unchanged]
Change 7.6.3.2p2 to:
The fesetround function attempts to establish the rounding direction
represented by its argument round. If the new rounding direction cannot
be established, it is left unchanged.
Change 7.6.3.2p3 to:
The fesetround function returns a zero value if and only if the
new rounding direction has been established.
Example 1 is valid in this option, though it may be desirable to replace
the assert by some other action.
Option B
--------
If the intention is that an implementation must allow the mode to be set
successfully to any FE_* macro that is defined, then make the following
changes:
Change 7.6.3.2p3 to:
The fesetround function returns a zero value if and only if the
argument is equal to a rounding direction macro defined in the
header (that is, if and only if the requested rounding direction
is one supported by the implementation).
In 7.6.3.2p4, change the lines:
setround_ok = fesetround (FE_UPWARD);
assert(setround_ok);
to:
assert(defined (FE_UPWARD));
fesetround(FE_UPWARD); // Can't fail
and delete the declaration of setround_ok.
UK-PC0287
Committee Draft subsection: 7.13, 7.13.2.1
Title: Clarify what the setjmp "environment" is
Detailed description:
Much of the state of the abstract machine is not stored in "objects"
within the meaning of the Standard. It needs to be clear that such state
is not saved and restored by the setjmp/longjmp combination.
Append to 7.13p2:
The environment of a call to the setjmp macro consists of
information sufficient for a call to the longjmp function to return
execution to the correct block and invocation of that block (if it
were called recusively). It does not include the state of the
floating-point flags, of open files, or of any other component of
the abstract machine.
In 7.13.2.1p3, change:
All accessible objects have values as of the time ...
to
All accessible objects have values, and all other components of the
abstract machine [*] have the same state as of the time ...
[*] This includes but is not limited to the floating-point flags and
the state of open files.
It also needs to be clear that optimisers need to take care. Consider
the following code:
jmp_buf env;
int v [N * 2];
for (int i = 0; i < N; i++)
{
v [2*i] = 0;
if (setjmp (env))
// ...
v [2*i+1] = f (); // might call longjmp; note i hasn't changed
}
This might be optimised as if written as:
jmp_buf env;
int v [N * 2];
for (ii = 0; ii < 2 * N; )
{
v [ii] = 0;
if (setjmp (env))
// ...
ii++;
v [ii] = f (); // might call longjmp
ii++;
}
Such code would be allowed to make ii indeterminate after a longjmp, but
the original code would not. It should be made clear, perhaps through an
example, that such an optimisation must allow for the possibility of a
longjmp happening.
Category 2
PC-UK0209
Committee Draft subsection: 6.10.3
Title: Add a __VA_COUNT__ facility for varargs macros
Detailed description:
Unlike with function calls, it is trivial for an implementation to
determine the number of arguments that match the ... in a varargs macro.
There are a number of useful things that can be done with this (at the
least, providing argument counts to varargs functions). Therefore this
information should be made available to the macro expansion.
In 6.10.3p5, change
The identifier /__VA_ARGS__/ ...
to:
The identifiers /__VA_ARGS__/ and /__VA_COUNT__/ ...
Append to 6.10.3.1p2:
An identifier /__VA_COUNT__/ that occurs in the replacement list
shall be replaced by a single token which is the number of trailing
arguments (as a decimal constant) that were merged to form the variable
arguments.
PC-UK0214
Committee Draft subsection: 6.3.4, plus scattered other changes
Title: better terminology for object lifetimes
Detailed description:
The term "lifetime" is used at a few places in the Standard but never
defined. Meanwhile a number of places uses circumlocutions such as "while
storage is guaranteed to be reserved". These would be much easier to read
if the term "lifetime" was defined and used.
Make the following changes to subclause 6.3.4.
Delete paragraph 5 and insert a new paragraph between 1 and 2:
The /lifetime/ of an object is the portion of program execution
during which storage is guaranteed to be reserved for that object.
An object exists and retains its last-stored value throughout its
lifetime. Objects with static or automatic storage duration have a
constant address throughout their lifetime.23 If an object is referred
to outside its lifetime, the behavior is undefined. The value of a
pointer is indeterminate after the end of the lifetime of the object
it points to.
Change paragraphs 2 to 4 (which will become 3 to 5) to:
[#2] An object whose identifier is declared with external or
internal linkage, or with the storage-class specifier static,
has static storage duration. The lifetime of the object is the
entire execution of the program. Its stored value is initialized
only once.
[#3] An object whose identifier is declared with no linkage
and without the storage-class specifier static has automatic
storage duration. For objects that do not have a variable
length array type, the lifetime extends from entry into the
block with which it is associated until execution of the block
ends in any way. (Entering an enclosed block or calling a function
suspends, but does not end, execution of the current block.)
If the block is entered recursively a new object is created each
time. The initial value of the object is indeterminate; if an
initialization is specified for the object, it is performed each
time the declaration is reached in the execution of the block;
otherwise, the value becomes indeterminate each time the
declaration is reached.
[#4] For objects that do have a variable length array type, the
lifetime extends from the declaration of the object until execution
of the program leaves the scope of that declaration24. If the scope
is entered recursively a new object is created each time. The initial
value of the object is indeterminate.
Other changes:
In 5.1.2p1 change "in static storage" to "with static storage duration".
Change footnote 9 to:
9) In accordance with 6.2.4, a call to exit will remain within the
lifetime of objects with automatic storage duration declared in main
but a return from main will end their lifetime.
Delete 5.1.2.3p5 as it just duplicates material in 6.2.4p3-4.
Change the last portion of 6.5.2.5p17 to:
of the loop only, and on entry next time around p would be
pointing to an object outside of its lifetime, which would
result in undefined behavior.
Change the last portion of footnote 72 to:
and the address of an automatic storage duration object after the
end of its lifetime.
Change the first sentence of 6.7.3.1p5 to:
Here an execution of B means the lifetime of a notional object with
type /char/ and automatic storage duration associated with B.
Add to 7.20.3 a second paragraph:
The lifetime of an object allocated by the calloc, malloc, or
realloc functions extends from the function call until the
object is freed by the free or realloc functions. The object has a
constant address throughout its lifetime except when moved by a call
to the realloc function.
The last sentence of 7.20.3p1 is redundant and could be deleted.
Relevant bullet points in annex K should also be changed.
PC-UK0222
Committee Draft subsection: 6.7.2.1
Title: Bitfields of unsupported types should require a diagnostic.
Detailed description:
If a bitfield is declared with a type other than /_Bool/ or plain, signed,
or unsigned int, the behavior is undefined. Since this can easily be
determined at compile time, a diagnostic should be required. It is
reasonable to exempt other integer types that the implementation knows how
to handle.
Add to the end of 6.7.2.1p3:
A bit-field shall have a type that is a qualified or unqualified
version of /_Bool/, /signed int/ or /unsigned int/, or of some other
implementation-defined integer type.
Delete the first sentence of 6.7.2.1p8.
Note that this wording allows additional implementation-defined bitfield
types so long as they are integers. If they are not, the behaviour would
not be defined by the Standard and so a diagnostic should still be required.
An implementer can also allow non-integer bitfield types, but a diagnostic
is still required.
PC-UK0232
Committee Draft subsection: 7.19.2, 7.24.3.5, 7.24.6
Title: Better locale handling for wide oriented streams
Detailed description:
7.19.2p6 associates an /mbstate_t/ object with each stream, and 7.19.3p11-13
state that this is used with the various I/O functions. On the other hand,
7.24.6p3 places very strict restrictions on the use of such objects,
restrictions that cannot be met through the functions provided in the
Standard while allowing convenient use of wide formatted I/O.
Furthermore, an /mbstate_t/ object is tied to a single locale based on the
first time it is used. This means that a wide oriented stream is tied to
the locale in use the first time it is read or written. This will be
surprising to many users of the Standard.
Therefore, at the very least these objects should be exempt from the
restrictions of 7.24.6; the restrictions of 7.19 (for example, 7.19.2p5
bullet 2) are sufficient to prevent unreasonable behaviour. In addition,
the locale of the object should be tied and not affected by the current
locale. The most sensible way to do this is to use the locale in effect
when the file is opened, but allow /fwide/ to override this.
In 7.19.2p6, add after the first sentence:
This object is not subject to the restrictions on direction of use
and of locale that are given in subclause 7.24.6. All conversions
using this object shall take place as if the /LC_CTYPE/ category
setting of the current locale is the setting that was in effect when
the orientation of the stream was set with the /fwide/ function or,
if this has not been used, when the stream was opened with the
/fopen/ or /freopen/ function.
In 7.24.3.5, add a new paragraph after paragraph 2:
If the stream is successfully made wide oriented, the /LC_CTYPE/
category that is used with the /mbstate_t/ object associated with
the stream shall be set to that of the current locale.
In 7.24.6p3, append:
These restrictions do not apply to the /mbstate_t/ objects associated
with streams.
PC-UK0245
Committee Draft subsection: 6.2.5, 6.7
Title: Problems with flexible array members
Detailed description:
Sometime after CD1 the following wording was added to 6.2.5p23:
A structure type containing a flexible array member is an
incomplete type that cannot be completed.
Presumably this was done to eliminate some conceptual problems with
structures that contain such members. However, this change makes almost
all use of such structures forbidden, because it is no longer possible to
take their size, and it is unclear what other operations are valid. This
was also not the intent behind the original proposal.
On the other hand, if such a structure is a complete type, there are a
number of issues to be defined, such as what happens when the structure
is copied or initialized. These need to be addressed.
The wording defining flexible array members is in 6.7.2.1p15:
[#15] As a special case, the last element of a structure
with more than one named member may have an incomplete array
type. This is called a flexible array member, and the size
of the structure shall be equal to the offset of the last
element of an otherwise identical structure that replaces
the flexible array member with an array of unspecified
length.95) When an lvalue whose type is a structure with a
flexible array member is used to access an object, it
behaves as if that member were replaced with the longest
array, with the same element type, that would not make the
structure larger than the object being accessed; the offset
of the array shall remain that of the flexible array member,
even if this would differ from that of the replacement
array. If this array would have no elements, then it
behaves as if it had one element, but the behavior is
undefined if any attempt is made to access that element or
to generate a pointer one past it.
A solution to the problem is to leave the structure as complete but have
the flexible member ignored in most contexts. To do this, delete the last
sentence of 6.2.5p23, and change 6.7.2.1p15 as follows:
[#15] As a special case, the last element of a structure
with more than one named member may have an incomplete array
|| type. This is called a flexible array member. With two
|| exceptions the flexible array member is ignored. Firstly, the
|| size of the structure shall be equal to the offset of the last
element of an otherwise identical structure that replaces
the flexible array member with an array of unspecified
length.95) Secondly, when the . or -> operator has a left
|| operand which is, or is a pointer to, a structure with a flexible
|| array member and the right operand names that member, it
behaves as if that member were replaced with the longest
array, with the same element type, that would not make the
structure larger than the object being accessed; the offset
of the array shall remain that of the flexible array member,
even if this would differ from that of the replacement
array. If this array would have no elements, then it
behaves as if it had one element, but the behavior is
undefined if any attempt is made to access that element or
to generate a pointer one past it.
Finally, add further example text after 6.7.2.1p18:
The assignment:
*s1 = *s2;
only copies the member n, and not any of the array elements.
Similarly:
struct s t1 = { 0 }; // valid
struct s t2 = { 2 }; // valid
struct ss tt = { 1, { 4.2 }}; // valid
struct s t3 = { 1, { 4.2 }}; // error; there is nothing
// for the 42 to initialize
t1.n = 4; // valid
t1.d [0] = 4.2; // undefined behavior
PC-UK0254
Committee Draft subsection: 7.8
Title: Missing functions for intmax_t values
Detailed description:
Several utility functions have versions for types int and long int, and
when long long was added corresponding versions were added. Then when
intmax_t was added to C9X, further versions were provided for some of
these functions. However, three cases were missed. For intmax_t to be
useful to the same audience as other features of the Standard, these three
functions should be added. Obviously they should be added to .
Add a new subclause 7.8.3:
7.8.3 Miscellaneous functions
7.8.3.1 The atoimax function
Synopsis
#include
intmax_t atoimax(const char *nptr);
Description
The atoimax function converts the initial portion of the string
pointed to by nptr to intmax_t representation. Except for the
behaviour on error, it is equivalent to
strtoimax(nptr, (char **)NULL, 10)
The function atoimax need not affect the value of the integer
expression errno on an error. If the value of the result cannot
be represented, the behavior is undefined.
Returns
The atoimax function returns the converted value.
7.8.3.2 The imaxabs function
Synopsis
#include
intmax_t abs(intmax_t j);
Description
The imaxabs function computes the absolute value of an integer j.
If the result cannot be represented, the behavior is undefined.
Returns
The imaxabs function returns the absolute value.
7.8.3.3 The imaxdiv function
Synopsis
#include
imaxdiv_t div(intmax_t numer, intmax_t denom);
Description
The imaxdiv function computes numer/denom and numer%denom in a
single operation.
Returns
The imaxdiv function returns a structure of type imaxdiv_t,
comprising both the quotient and the remainder. The structure shall
contain (in either order) the members quot (the quotient) and rem
(the remainder), each of which have the type intmax_t. If either
part of the result cannot be represented, the behavior is undefined.
7.8 paragraph 2 will need consequential changes.
PC-UK0274
Committee Draft subsection: 6.3.1.3
Title: Clarify the semantics of integer conversions
Detailed description:
Clause 6.3.1.3, para 2. The bit about converting the type, in
C90 (6.2.1.2), has been deleted. What happens when an object of
type int having a value of -1 is assigned to an unsigned long? If
the adding and subtracting described in para 2 is performed on the
int object the resulting value will be different than if the original
value was first promoted to long (assuming int and long have a different
width).
Add footnote (see footnote 28 in C90) stating that the rules describe
arithmetic on the mathematical value, not on the value of a given
type of expression.
PC-UK0275
Committee Draft subsection: 6.6
Title: lacuna in sizeof/VLA interactions in constant expressions
Detailed description:
Clause 6.6, para 8: This paragraph does not contain the
extra wording on sizeof "... whose results are integer constants, ..."
found in para 6. This needs to be added.
PC-UK0279
Committee Draft subsection: 6.2.6.2
Title: Remove or clarify one's complement and sign-and-magnitude
Detailed description:
Subclause 6.2.6.2p2 makes it clear that there are only three permitted
representations for signed integers - two's complement, one's complement,
and sign-and-magnitude. It is reported, however, that certain historical
hardware using the latter two have problems with the "minus zero"
representation. Software not written with minus zero in mind can also run
into problems; for example, the expressions:
0 & 1
or
(-2 + 2) & 1
might turn out to be true because a minus zero has happened (bit operators
are defined to act on the bit patterns, so this is an issue). It is
inconvenient to have to code defensively around this problem, and most
programmers are probably not even aware of it.
However, enquiries have failed to identify any existing platform that does
not use two's complement, and so the time may have come to require it as
part of C. This approach is addressed in option A below. If WG14 is not
willing to do this, the changes in option B deal with the issues of minus
zero, by forbidding it from appearing unexpectedly.
Option A
--------
Change the last part of 6.2.6.2p2 from:
If the sign bit is zero, it shall not affect the resulting value. If
the sign bit is one, then the value shall be modified in one of the
following ways:
-- the corresponding value with sign bit 0 is negated;
-- the sign bit has the value -2N;
-- the sign bit has the value 1-2N.
to:
The sign bit shall have the value -2N.[*]
and add the footnote:
[*] This is often known as 2's complement.
Consequential changes will be required in 5.2.4.2.1, 7.18.2, and 6.7.7,
and possibly elsewhere.
Option B
--------
Change the last part of 6.2.6.2p2 from:
If the sign bit is one, then the value shall be modified in one of the
following ways:
-- the corresponding value with sign bit 0 is negated;
-- the sign bit has the value -2N;
-- the sign bit has the value 1-2N.
to:
If the sign bit is one, then the value shall be modified in one of the
following ways:
-- the corresponding value with sign bit 0 is negated (/sign and
magnitude/);
-- the sign bit has the value -2N (/two's complement/);
-- the sign bit has the value 1-2N (/one's complement/).
The implementation shall document which shall apply, and whether the
value with sign bit 1 and all value bits 0 (for the first two), or
with sign bit and all value bits 1 (for one's complement) is a trap
representation or a normal value. In the case of sign and magnitude
and one's complement, if this representation is a normal value it is
called a /negative zero/.
and insert two new paragraphs immediately afterwards:
If the implementation supports negative zeros, then they shall only be
generated by:
- the & | ^ ~ << and >> operators with appropriate arguments;
- the + - * / and % operators where one argument is a negative zero
and the result is zero;
- compound assignment operators based on the above cases.
It is unspecified if these cases actually generate negative zero or
normal zero, and whether a negative zero becomes a normal zero or
remains a negative zero when stored in an object.
If the implementation does not support negative zeros, the behavior
of an & | ^ ~ << or >> operator with appropriate arguments is undefined.
PC-UK0281
Committee Draft subsection: 6.10
Title: Parsing ambiguity in preprocessing directives
Detailed description:
Consider parsing the following text during the preprocessing phase
(translation phase 4):
# if 0
xxxx
# else
yyyy
# endif
The third line fits the syntax for the first option of group-part, and thus
generates two possible parsings. One of these will cause both text lines to
be skipped, while the other only causes the second to be skipped.
To fix this ambiguity. In the syntax in 6.10p1, change group-part to:
group-part:
non-directive new-line
if-section
control-line
and add:
non-directive:
pp-tokens/opt
Then add a new paragraph to the Constraints, after 6.10p3:
If the first preprocessing-token (if any) in a non-directive is
/#/, the second shall be an identifier other than one of those
that appears in the syntax in this subclause. Such a non-directive
shall only appear in a group that is skipped.
This change has two (deliberate) side-effects: unknown preprocessing
require a diagnostic if not skipped, and in any case cannot affect the
state of conditional inclusion. If such a directive is recognised by the
implementation, it can still interpret it in any desired way after
outputting the diagnostic.
PK-UK0282
Committee Draft subsection: 6.10.8
Title: provide a __STDC_HOSTED__ macro
Detailed description:
There is currently no way for a program to determine if the implementation
is hosted or freestanding. A standard predefined macro should be provided.
Add to the list in 6.10.8p1:
__STDC_HOSTED__ The decimal constant 0 if the implementation is a
freestanding one and the decimal constant 1 if it is
a hosted one.
Note: it has been suggested that this is difficult to provide when there
is an independent preprocessor because it will not know what language the
compiler is handling or what library is available, but the same points
apply to the standard headers, to __STC_VERSION__, to __STDC_IEC_559__,
and so on; if these can be handled correcly by such an implementation, so
can __STDC_HOSTED__.
PC-UK0283
Committee Draft subsection: 7.14.1.1, 7.20.4
Title: _Exit function
Detailed description:
As part of a working paper (N789), I suggested that C provide an _exit()
function like that in POSIX, and signal handlers should be allowed to
call this function. The Menlo Park meeting agreed to add this function
unless an unresolvable technical issue was found that would make it not
conformant to POSIX.
The Santa Cruz meeting decided not to include this function because they
felt that there was a possibility of conflict with POSIX. The functionality
is still needed, as without it there is no safe way to leave a signal
handler, and so it is being resubmitted with a new name in the implementer's
namespace.
In 7.14.1.1p5, change:
or the signal handler calls any function in the standard library
other than the /abort/ function or the /signal/ function
to:
or the signal handler calls any function in the standard library
other than the /abort/ function, the /_Exit/ function, or the
/signal/ function
Add a new subclause within 7.20.4:
7.20.4.X The _Exit function
Synopsis
#include
void _Exit (int status);
Description
The /_Exit/ function causes normal program termination to occur,
and control to be returned to the host environment. No functions
registered by the /atexit/ function or signal handlers registered by
the /signal/ function are called. The /_Exit/ function never returns
to the caller. The status returned to the implementation is determined
in the same manner as for the /exit/ function. It is implementation-
defined whether open output streams are flushed, open streams closed,
or temporary files removed.
PC-UK0284
Committee Draft subsection: 6.10.3
Title: Problems with extended characters in object-like macros
Detailed description:
When an object-like macro is #defined, there is no requirement for a
delimiter between the macro identifier and the replacement list. This can
be a problem when extended characters are involved - for example, some
implementations view $ as valid in a macro identifier while others do not.
Thus the line:
#define THIS$AND$THAT(x) ((x)+42)
can be parsed in either of two ways:
Identifier Arguments Replacement list
THIS - $AND$THAT(x) ((x)+42)
THIS$AND$THAT x ((x)+42)
TC1 addressed this by requiring the use of a space in certain circumstances
so as to eliminate the ambiguity. However, this requirement has been
removed in C9X for good reasons. Regrettably this reintroduces the original
ambiguity.
The proposed solution is to require that the macro identifier in the
definition of object-like macros be followed by white space or by one of
the basic graphic characters that is not ambiguous. Most code already
uses white space and such code will not be affected. Code such as:
#define x+y z
which actually means
#define x +y z
will also not be affected by the first option. The only cases that are
affected will require a diagnostic, thus eliminating the ambiguity.
Insert a new Constraint in 6.10.3, preferably:
In the definition of an object-like macro there shall be white space
between the identifier and the replacement list unless the latter
begins with one of the 26 graphic characters in the basic character
set other than ( _ or \.
(or equivalent wording) or alternatively:
In the definition of an object-like macro there shall be white space
between the identifier and the replacement list.
PC-UK0285
Committee Draft subsection: 7.19.5.1
Title: Clarify meaning of a failed fclose
Detailed description:
If a call to fclose() fails it is not clear whether:
- it is still possible to access the stream;
- whether fflush(NULL) will attempt to flush the stream;
- whether it is safe to destroy a buffer allocated to the stream by
setvbuf().
There are two possibilities: a failed fclose can leave the stream open as
far as the program is concerned, or it can leave it closed (irrespective of
the state of the underlying entity). The first case is a problem if the
close fails part way through, as it might not be possible to reinstate the
status of the stream. Therefore the second is better, because the
implementation can always carry out those parts of the cleanup that are
visible to the program (such as the second and third items above).
The existing wording - read strictly - also requires the full list of
actions to be carried out successfully whether or not the call fails. This
is clearly an oversight.
Change 7.19.5.1p2 to read:
A successful call to the fclose function causes the stream pointed
to by stream to be flushed and the associated file to be closed.
Any unwritten buffered data for the stream are delivered to the
host environment to be written to the file; any unread buffered
data are discarded. Whether or not the call succeeds, the stream
is disassociated from the file and any buffer set by the setbuf
or setvbuf function is disassociated from the stream, and if the
later was automatically allocated, it is deallocated.
Note that this does not require anything outside the control of the
implementation to take place if the call fails, while still leaving the
program in a safe state.
PC-UK0261
Committee Draft subsection: 6.10.8
Title: Distinguishing C89 from C9X
Detailed description:
Because of the widespread and important changes between C89 and C9X, it
is very important for an application to be able to determine which
language the implementation is supporting. __STDC_VERSION__ may have
been intended for this purpose, but is not entirely reliable and has the
wrong properties. Amongst other faults, it is nowhere stated that it
will be increased, or even continue to have integer values, so it is not
possible to test in a program designed for long-term portability.
There are two possible solutions to this. The first is to use the
value of __STDC__ as an indicator of the C language variant, by adding
wording like:
__STDC__ shall be set to 2, to indicate the C language described in
this document, rather than 1, which indicated the language described
in ISO/IEC 9899:1990 and ISO/IEC 9899:AMD1:1995.
That is an entirely reliable indicator of whether an implementation
conforms to C89 or C9X, and follows existing practice (many vendors use
a value of 0 to indicate K&R/standard intermediates.) A second approach
is to define the meaning of __STDC_VERSION__ more precisely, by adding
wording like:
It is the intention of this standard that the value of
__STDC_VERSION__ may be used to determine which revision of the
standard an implementation is conforming to, and that it will
remain a constant of type long that is increased at each revision.
That is not entirely reliable, but would probably do.
PC-UK0262
Committee Draft subsection: 6.3.1.3 and 6.10.6
Title: Detecting C89/C9X incompatibilities
Detailed description:
Because of the change in the status of the long and unsigned long types,
it is very important to be able to detect when an application was
conforming in C89 and is undefined in C9X, or has a different effect in
C89 and C9X. The following should be added to 6.3.1.3:
6.3.1.3.1 The C89_MIGRATION pragma
The C89_MIGRATION pragma can be used to constrain (if the state is
on) or permit (if the state is off) integer conversions from higher
ranks than long or unsigned long to types that are explicitly
declared as either long or unsigned long. Each pragma can occur
either outside external declarations or preceding all explicit
declarations and statements inside a compound statement. When
outside external declarations, the pragma takes effect from its
occurrence until another C89_MIGRATION pragma is encountered, or
until the end of the translation unit. When inside a compound
statement, the pragma takes effect from its occurrence until another
C89_MIGRATION pragma is encountered (within a nested compound
statement), or until the end of the compound statement; at the end
of a compound statement the state for the pragma is restored to its
condition just before the compound statement. If this pragma is
used in any other context, the behavior is undefined. The default
state (on or off) for the pragma is implementation-defined.
For the purposes of the C89_MIGRATION pragma, a type is
explicitly declared as long or unsigned long if its type
category (6.2.5) is either long or unsigned long and either of
the following conditions is true:
The type specifier in the declaration or type name that
defines the type is long or unsigned long, in any of the
equivalent forms described in 6.7.2.
The type specifier in the declaration or type name that
defines the type is a typedef name which is not defined in a
standard header and whose type satisfies the previous
condition. This rule shall be applied recursively.
Constraints
If the state of the C89_MIGRATION pragma is on, no value with a
signed integer type of higher integer conversion rank than long or
with an unsigned integer type of higher integer conversion rank than
unsigned long or shall be converted to a type that is explicitly
declared as either type long or unsigned long.
If the state of the C89_MIGRATION pragma is on, no function with a
type that does not include a prototype shall be called with an
argument that has a signed integer type of higher integer conversion
rank than long or an unsigned integer type of higher integer
conversion rank than unsigned long.
Recommended practice
A similar constraint should also be applied to programs that use
conversion specifiers associated with long or unsigned long (e.g.
%ld or %lu) for integer values or variables of a higher rank,
where this can be checked during compilation.
The following should be added to 6.10.6
#pragma STDC C89_MIGRATION on-off-switch
PC-UK0269
Committee Draft subsection: 5.1.2.3
Title: Ambiguity in what is meant by "storing"
Detailed description:
The standard assumes the concept of "storing" a value in many places
(e.g. when floating-point values must be converted to their target type)
but nowhere defines it. It is not obvious that argument passing is a
storage operation. Some wording like the following should be added
after paragraph 3 in 5.1.2.3:
The data model used in the abstract machine is that all objects are
sequences of bytes in memory, and that assignment (including to
register objects, argument passing etc.) consists of storing data in
those bytes.
PC-UK0270
Committee Draft subsection: 7.20.4.2, 7.20.4.3
Title: Ambiguity in when exit calls atexit functions
Detailed description:
It is unclear whether exit calls functions registered by atexit as if it
were a normal function, or whether it may unwind the stack to the entry
to main before doing so. This affects whether it is legal to call
longjmp to leave an atexit function to return to a location set up by a
call of setjmp before the call of exit.
This should be clarified, which could include making it explicitly
undefined.
Category 3
PC-UK0227
Committee Draft subsection: 6.7.7
Title: Correct ranges of bitfields in an example
Detailed description:
In 6.7.7p6, example 3, describes the ranges of various bit-fields in terms
of "at least the range". This is because C89 was not clear on what the
permitted ranges of integer types was. These ranges are now tightly
specified by 6.2.6.2, and so the wording of this example should be altered
accordingly:
- change "at least the range [-15, +15]"
to "either the range [-15, +15] or the range [-16, 15]"
- change "values in the range [0, 31] or values in at least the range
[-15, +15]"
to "values in one of the ranges [0, 31], [-15, +15], or [-16, +15]"
PC-UK0249
Committee Draft subsection: 6.4
Title: UCNs as preprocessing-tokens
Detailed description:
In 6.4 the syntax for "preprocessing-token" includes:
identifier
each universal-character-name that cannot be one of the above
In 6.4.2.1 the syntax for "identifier" includes:
identifier:
identifier-nondigit
identifier identifier-nondigit
identifier digit
identifier-nondigit:
nondigit
universal-character-name
other implementation-defined characters
Therefore a universal-character-name is always a valid identifier
preprocessing token, and so the second alternative can never apply. It is
true that 6.4.2.3p3 makes certain constructs undefined, but this does not
alter the tokenisation.
There are two ways to fix this situation. The first is to delete the second
alternative for preprocessing-token. The second would be to add text to
6.4p3, or as a footnote, along the following lines:
The alternative "each universal-character-name" that cannot be one of
the above can never occur in the initial tokenisation of a program in
translation phase 3. However, if an identifier includes a universal-
character name that is not listed in Annex I, the implementation may
choose to retokenise using this alternative.
PC-UK0251
Committee Draft subsection: 6.8.5
Title: Error in new for syntax
Detailed description:
C9X adds a new form of syntax for for statements:
for ( declaration ; expr-opt ; expr-opt ) statement
However, 6.7 states that /declaration/ *includes* the trailing semicolon.
The simplest solution is to remove the corresponding semicolon in 6.8.5 and
not worry about the informal use of the term in 6.8.5.3p1. Alternatively
the syntax needs to be completely reviewed to allow the term to exclude the
trailing semicolon.
PC-UK0252
Committee Draft subsection: 6.9
Title: References to sizeof not allowing for VLAs
Detailed description:
6.9p3 and p5 use sizeof without allowing for VLAs. In each case, change
the parenthetical remark:
(other than as a part of the operand of a sizeof operator)
to:
(other than as a part of the operand of a sizeof operator which
is not evaluated)
PC-UK0256
Committee Draft subsection: 7.23.3.7
Title: Wrong time system notation used
Detailed description:
In 7.23.3.7p2, the expression "UTC-UT1" appears. This should read "TAI-UTC".
PC-UK0257
Committee Draft subsection: 7.25.2.1, 7.25.3.
Title: ISO10646 to/from wchar_t conversion functions.
Often programs that manipulate C source code are themselves written in
C. The purpose of these changes is to make it easier for such
programs to handle universal character names, specified in input files
not source files, portably. They can also be used for interpreting
data files and suchlike, although the preferred way to do this is to
use the appropriate locale; thus, there is no functionality for
converting several wide characters at a time.
The mapping functions below could be implemented by writing
wchar_t u2wc[] = {
L'\U00000000', L'\U00000001', L'\U00000002', ...
}
wint_t toiso10646wc(long iso10646) { return u2wc[iso10646]; }
and the reverse for toiso10646wc, except that implementation limits
will usually prohibit such a large array.
The functions can be trivially defined to return -1 or WEOF always,
although this is not recommended. This can happen, for instance, if
the wide character set in use does not have any characters which have
known equivalents in ISO10646. It may happen that even if a wide
character does have an equivalent in ISO10646, that it is unreasonable
for the runtime library to know about it, and in such cases the
functions may return -1 or WEOF (this is a quality-of-implementation
issue).
The names of the functions are chosen to not tread on anybody's
namespace. `long' is chosen because int_fast32_t need not be defined
by wctype.h. I would have used (long)WEOF instead of -1 as the error
return for towciso10646, but (long)WEOF might be a valid result: for
instance, wchar_t is 64 bits, WEOF is 0xFFFFFFFF00000000ll, long is 32
bits.
These changes apply to the committee draft of August 3, 1988. Add
after section 7.25.2.1.11 "The iswxdigit function":
7.25.2.1.12 The iswiso10646 function
Synopsis
#include
int iswiso10646(wint_t wc);
Description
The /iswiso10646/ function tests for those characters for which
/towciso10646/ would not return -1.
Add to section 7.25.2.2.1 "The iswctype function":
iswctype(wc, wctype("iso10646")) // iswiso10646(wc)
Add after section 7.25.3.2.2 "The wctrans function":
7.25.3.3 Wide-character ISO10646 mapping functions
The function /towciso10646/ and the function /toiso10646wc/
convert wide characters to and from ISO10646 code points.
7.25.3.3.1 The towciso10646 function
Synopsis
#include
long towciso10646(wint_t wc);
Description
The /towciso10646/ function returns the ISO10646:1993 code point
corresponding to /wc/, or -1.
If /towciso10646/ does not return -1, then
/toiso10646wc(towciso10646(wc))/ returns /wc/.
Recommended Practise
/towciso10646(L'\Unnnnnnnn')/ returns /0xnnnnnnnnl/ when
/\Unnnnnnnn/ is a universal character name that corresponds to a
wide character.
/towciso10646/ does not return -1 for wide characters
corresponding to those required in the basic execution character set.
7.25.3.3.2 The toiso10646wc function
Synopsis
#include
wint_t toiso10646wc(long iso10646);
Description
The toiso10646wc function returns the wide character corresponding to
the ISO10646:1993 code point /iso10646/, or /WEOF/.
If /toiso10646wc/ does not return /WEOF/, then
/towciso10646(toiso10646wc(iso10646))/ returns /iso10646/.
PC-UK0276
Committee Draft subsection: various
Title: Assorted editorial changes
Detailed description:
Each of these changes stands alone.
[A]
Clause 3.11: Change to "unspecified behaviour where each implementation
shall document the behaviour for that implementation."
[B]
Clause 3.13: documents -> shall document
[C]
Clause 6.3.2.2: When is an expression "evaluated as a void
expression". The original (6.2.2.2) wording is much clearer and
should continue to be used.
[D]
Clause 6.4.3, para 2: Change "... required character set." to
"... required source character set."? But they may also apply
to the execution character set. Which required character set
is being required?
[We believe that the term "required" is being replaced.]
[E]
Clause 6.1.1.2, para 5. Remove. The standard is not in the
business of specifing quality of implementation diagnostics.
[F]
Clause 6.5, para 4: Change "... are required to have ..." back
to "... shall have ...".
[G]
Clause 6.5.2.2, para 4: Change "An argument may be ..." to
"An argument shall be ...".
[H]
Footnote 85: Delete. It appears to be a glorified forward reference.
[I]
Clause 6.8.4.2, para 3: Change "... expression and no two ..." back to
"... expression. No two ...".
PC-UK0263
Committee Draft subsection: 7.18.3 and 7.19.1
Title: Support for data management
Detailed description:
Because of the change in the status of the long and unsigned long types,
there is a need for an efficient data type that can be used to perform
calculations on mixed file sizes and data object sizes. The obvious
candidate is to define an off_t that is compatible with POSIX, but which
makes sense on non-POSIX systems. The following should be added to
7.19.1 after paragraph 2:
off_t
which is a signed integer type of integer conversion rank not less
than that of any of long, size_t or ptrdiff_t, and capable of holding
both the size of any object allocated by the malloc function
(7.20.3.3) and the maximum number of bytes that a conforming
application can write to a file opened with mode "wb+" and read back
again in an arbitrary order.
Recommended practice
The off_t type should be capable of holding the total size of all
accessible objects at any instant during the execution of a program.
The reason for the above (apparently contorted) wording is to allow
off_t to be a 32-bit type on a system where long, size_t, ptrdiff_t and
the size of files are all 32-bit quantities (e.g. traditional Unix), but
to require it to be longer if any of those are larger or if any single
object larger than 2 GB can actually be allocated. Note that Unix pipes
and similar objects have never had a definite limit on their size.
The following should be added to 7.18.3:
OFF_MIN -2147483647
OFF_MAX +2147483647
Note that POSIX defines off_t to be only a signed arithmetic type, and
not an integer one, but traditional practice (and consistency) requires
that it be integral. To the best of my knowledge, no implementation
that conforms with POSIX has ever defined off_t to be a floating type.
The above specification is therefore what existing POSIX practice is,
though framed in words that are not specific to POSIX.
This type should also have a flag character defined for use in
conversion specifiers in 7.19.6.1, 7.19.6.2, 7.24.2.1, 7.24.2.2. As the
obvious letters were all already taken in C89, it does not matter much
what it is.
PC-UK0264
Committee Draft subsection: 7.8.2
Title: functions for intmax_t and uintmax_t
Detailed description:
Because of the change in the status of the long type, it is necessary to
change many or all uses of that type in many important C89 programs to
other types (often intmax_t). It is highly desirable that it should be
straightforward to do this by automatic textual processing, but should
still produce an efficient result; one obstacle to this is the lack of
equivalents to the labs and ldiv functions for the maximum length types.
The following should be added to 7.8, after paragraph 1:
It declares the type
maxdiv_t
which is a structure type that is the type of the value
returned by the maxdiv function.
And the following should be added to 7.8.2:
7.8.2.3 The maxabs function
Synopsis
[#1]
#include
intmax_t maxabs(intmax_t j);
Description
[#2] The maxabs function computes the absolute value of an
integer j. If the result cannot be represented, the behavior is
undefined.
[#3] The maxabs function returns the absolute value.
7.8.2.4 The maxdiv function
Synopsis
[#1]
#include
maxdiv_t maxdiv(intmax_t numer, intmax_t denom);
Description
[#2] The maxdiv function computes numer / denom and numer %
denom in a single operation.
Returns
[#3] The maxdiv function returns a structure of type maxdiv_t,
comprising both the quotient and the remainder. The structure
shall contain (in either order) the members quot (the quotient)
and rem (the remainder), each of which have the same type as the
arguments numer and denom. If either part of the result cannot
be represented, the behavior is undefined.
PC-UK0265
Committee Draft subsection: 7.19.6.1, 7.19.6.2, 7.24.2.1, 7.24.2.2
Title: Use a better flag character for intmax_t and uintmax_t
Detailed description:
In C89 Future Library Directions, it was said that "Lower-case letters
may be added to the conversion specifiers in fprintf and fscanf. Other
characters may be used in extensions." Some implementors have ignored
this, which is only to be expected. However, C9X has bent over
backwards to support such gratuitously perverse implementations.
In addition to establishing a very bad precedent, this is a long term
drain on resources. Some of us have to teach C and debug assist with
the debugging of other people's code; every important but non-mnemonic
facility takes extra time and increases errors.
Because of the importance of conversions using the intmax_t and
uintmax_t types, their flag character should be memorable. If the
committee really feels that 'm' is unacceptable, then it should be 'z'
and the specifier for size_t should be 'j'. 'z' is at least a common
convention for the final item of a sequence.
There should also be a flag character defined for the off_t type,
because of its importance in data manipulation code and as a migration
path. As the obvious letters were all already taken in C89, it does not
matter much what it is.
____ end of United Kingdom Comments; beginning of USA Comments ____
From: Matthew Deane <mdeane@ANSI.org>
The US National Body votes to Approve with comments ISO/IEC FCD 9899,
Information Technology - Programming languages - Programming Language C (
Revision of ISO/IEC 9899:1990). See below comments.
===========================================
Author: Douglas Walls
Comment 1.
Category: Inconsistency
Committee Draft subsection: 7.20.4.2 The atexit function
7.20.4.3 The exit function
Title: atexit call after exit should be undefined
Detailed description:
The description of the atexit function 7.20.4.3p2 states the atexit
function registers the function pointed to by the argument to atexit,
to be called without arguments at normal program termination. The
description of the exit function 7.20.4.3p3 states first all functions
registered by the atexit function are called, in the reverse order of
their registration. Neither the descriptions of the atexit nor exit
functions adequeatly define how the following program should function:
int fl(){}
int f2(){ atexit(f1); }
int main(){ atexit(f2); exit(0); }
Suggested fix:
Add a sentence to 7.20.4.3p3, the exit function stating:
If a call to atexit occurs after a call to exit, the behavior is
undefined.
John Hauser
Comment 1.
Category: Normative change to existing feature retaining the original
intent
?
Committee Draft subsection: F.9.1.4
Title: atan2 of infinite and zero magnitude vectors
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
Section F.9.1.4 defines
atan2(+0,+0) -> +0
atan2(+0,-0) -> +pi
atan2(-0,+0) -> -0
atan2(-0,-0) -> -pi
atan2(+infinity,+infinity) -> +pi/4
atan2(+infinity,-infinity) -> +3*pi/4
atan2(-infinity,+infinity) -> -pi/4
atan2(-infinity,-infinity) -> -3*pi/4
Unfortunately, all of these results, while plausible, are in no way
determinate. Note, for example, that any value from +0 to +pi/2 is
an equally plausible result for atan2(+infinity,+infinity). Defining
atan2 as above is tantamount to the decision made for APL that
0/0 would be 1. The hope presumably is that these values will be
innocuous for most uses. In contrast, the IEEE Standard makes it a
rule that indeterminate cases signal an invalid exception and return
a NaN. The APL decision has since been regretted, and this one may be
too if it prevails.
Fix:
Define the above cases as domain errors.
----------------------------------------------------------------
Comment 2.
Category: Normative change to existing feature retaining the original
intent
?
Committee Draft subsection: F.9.4.4, 7.12.8.4, and F.9.5.4
Title: Infinite results from pow and tgamma
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
Section F.9.4.4 defines all the following to return +infinity:
pow(x,+infinity) x < -1
pow(x,-infinity) -1 < x < 0
pow(-0,y) y < 0 and y not an integer
pow(-infinity,y) y > 0 and y not an integer
Consider, for example, pow(-3,+infinity). We can infer that this
value has infinite magnitude, but unless we can assume the +infinity
exponent is an even or odd integer, we can't say anything conclusive
about the sign of the result. All the cases above share the property
that the correct result is at best ambiguous between +infinity and
-infinity. Currently, CD2 makes the dubious choice of forcing the sign
of the result positive, rather than taking the safer route of returning
a NaN. It's not clear that these cases are important enough to warrant
bending the rules. (Does pow(-3,+infinity) really come up so often
that efficiency is more important than correctness?)
Interestingly, the opposite choice was made for the tgamma function.
For all nonpositive integers x, tgamma(x) could be taken as either
+infinity or -infinity, with no way to choose between them, just as
for pow above. However, Section 7.12.8.4 makes this case an explict
domain error, and Section F.9.5.4 confirms this decision.
Possible fixes:
Make the pow cases listed above domain errors, consistent with tgamma's
treatment of infinity results. Doing so would also make pow(x,0.5)
almost identical to sqrt(x). (The remaining difference would be the
IEEE mistake that has sqrt(-0) -> -0.)
If that isn't possible, change tgamma to be consistent with pow,
returning +infinity for cases now listed as domain errors.
------------------------------------------------
Comment 3.
Category: Normative change to existing feature retaining the original
intent
?
Committee Draft subsection: F.9.9.1 and 7.12.12.1
Title: fdim of two infinities
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
As defined by Section 7.12.12.1, fdim(x,y) returns x-y if that's
positive, and otherwise returns +0. Unfortunately, as written the
definition gives
fdim(+infinity,+infinity) -> +0
fdim(-infinity,-infinity) -> +0
which is at odds with the fact that x-x is indeterminate if x is
infinite. As with atan2 (in another comment), it appears that a
plausible but essentially arbitrary result has been selected for
convenience, in violation of the IEEE Standard precept that
indeterminate cases return NaN and signal the invalid exception.
Fix:
Define the above cases as domain errors.
------------------------------------------------------------------
Comment 4.
Category: Normative change to intent of existing feature
Committee Draft subsection: F.9.9.2, F.9.9.3, 7.12.12.2, and 7.12.12.3
Title: fmax and fmin of NaN arguments
Detailed description:
Problem:
Sections F.9.9.2 and F.9.9.3 state that if exactly one argument to
fmax or fmin is a NaN, the NaN argument is to be ignored and the other
argument returned. A footnote in Section 7.12.12.2 further clarifies
that
NaN arguments are treated as missing data.
This treatment of NaNs contradicts the semantics given by the
IEEE Standard that a NaN may represent an indeterminate value, the
calculation of which has failed due to an invalid exception. In such
cases, the NaN cannot necessarily be dismissed as ``missing data.''
For reasons of correctness, all IEEE operations---and for that matter
all other CD2 <math.h> functions---require that any NaN argument be
propagated as the result of the function or operation if possible.
(It will not be possible if the function result type is not
floating-point. An exception is made for bit-manipulating functions
such as copysign, and for cases when the function result is fully
determined by the other arguments.) It is both risky and unnecessarily
complicating for fmax and fmin to deviate from this convention.
Fix:
Define fmax and fmin to propagate NaNs in the usual way. The only
cases for which NaNs don't propagate are
fmax(+infinity,NaN) -> +infinity
fmin(-infinity,NaN) -> -infinity
and the same with arguments reversed.
David R Tribble
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.18
Title: <stdint.h> [u]intN_t names
Detailed description:
Section 7.18 describes the exact-, minimum-, and fastest-width integer
types. While I agree that these types are very useful, I feel that
their names are misleading.
It can be argued that programmers are more likely to use the 'intN_t'
names than the other names, if for no other reason than because they
are short. This has the potential of creating problems for programmers
who are compiling their programs on machines that do not provide
efficient N-bit representations or do not provide them at all.
It can also be argued that programmers that use the 'intN_t' more than
likely really meant to use the 'int_leastN_t' types. Such programmers
probably want a type with a guaranteed number of bits rather than an
exact number of bits.
It can be argued further that use of the 'intN_t' names is not portable
because they may not exist in all implementations. (An implementation
is not required to provide these types.)
The main point behind all these arguments is that a short name such as
'int8_t' should represent the most common and the most useful integer
"width" type, and that the "exact width" meaning is inappropriate for
it.
For these reasons, the names of the exact-width and least-width type
names should be changed. Instead of 'int8_t', we should have
'int_exact8_t', and instead of 'int_least8_t', we should have 'int8_t'
(and so forth for the other type names). The standard integer type
names then become:
Exact-width Minimum-Width Fastest minimum-width
-------------- ------------- ---------------------
int_exact8_t int8_t int_fast8_t
int_exact16_t int16_t int_fast16_t
int_exact32_t int32_t int_fast32_t
int_exact32_t int64_t int_fast64_t
uint_exact8_t uint8_t uint_fast8_t
uint_exact16_t uint16_t uint_fast16_t
uint_exact32_t uint32_t uint_fast32_t
uint_exact32_t uint64_t uint_fast64_t
The benefits of these names over the current names are:
1. The 'intN_t' types always exist in conforming implementations.
2. The 'intN_t' types have a more intuitive meaning; use of them
indicates the need for integers of well-known minimum widths.
3. Use of the 'intN_t' types covers the most common use of such types,
and thus the existence of a short, convenient name is reasonable.
4. Use of the 'int_exactN_t' types indicates a real need for integers
with exactly known widths; this is probably a rare need and thus
the existence of a bulkier type name is acceptable.
For consistency, the macros in sections 7.18.2.1 and 7.18.2.2 should
also be renamed accordingly:
7.18.2.1 Limits of exact-width integer types
INT_EXACT8_MIN INT_EXACT8_MAX UINT_EXACT8_MAX
INT_EXACT16_MIN INT_EXACT16_MAX UINT_EXACT16_MAX
INT_EXACT32_MIN INT_EXACT32_MAX UINT_EXACT32_MAX
INT_EXACT64_MIN INT_EXACT64_MAX UINT_EXACT64_MAX
7.18.2.2 Limits of minimum-width integer types
INT8_MIN INT8_MAX UINT8_MAX
INT16_MIN INT16_MAX UINT16_MAX
INT32_MIN INT32_MAX UINT32_MAX
INT64_MIN INT64_MAX UINT64_MAX
------------------------------------------------------------------
Eric Rudd
Comment 1.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 7.12.4.4
Title: atan2 function is unclearly specified
Detailed description:
> [#2] The atan2 functions compute the principal value of the
> arc tangent of y/x, using the signs of both arguments to
> determine the quadrant of the return value.
The principal value of the arc tangent has a range from -pi/2 to +pi/2,
but atan2() returns a value that has a range from -pi to +pi. The
definition in N2794 has further complications in the case where x=0 and
the implementation does not support infinities.
It is not specified in the existing definition how to use the signs of
the arguments to determine the quadrant of the return value. Of course,
the underlying assumption is that a vector angle is really being
computed. Thus, I would propose replacing the above sentence with the
following:
"The atan2 functions compute the angle of the vector (x,y) with respect
to the +x axis, reckoning the counterclockwise direction as positive."
Comment 2.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.12.4.4, 7.12.1
Title: atan2 indeterminate cases
Detailed description:
> A domain error may occur if both arguments are zero.
atan2(0., 0.) and atan2(INF, INF) are both mathematically indeterminate,
so, by 7.12.1, a domain error is *required*. Thus, I would propose the
following wording:
"A domain error occurs if both arguments are zero, or if both arguments
are infinite."
Comment 3.
Category: Request for information/clarification
Committee Draft subsection: 7.12.4.4
Title: atan2 range
Detailed description:
There is a minor problem with the statement of range of the atan2
function if signed zeros do not exist in an implementation. If |y|=0.,
and x<0., should the value of atan2(y, x) be -pi or +pi? The range is
stated in 7.12.4.4 to be [-pi,+pi], but I don't see how the range can be
inclusive at both ends unless signed zeros exist in the implementation.
Comment 4.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.12.1, Annex F.9.1.4, F.9.4.4
Title: Annex F considered harmful
Detailed description:
Annex F attempts to define numerical return values for the math
functions in cases where the underlying mathematical functions are
indeterminate. I regard this as a disastrous mistake, especially in
light of 7.12.1:
[#2] For all functions, a domain error occurs if an input
argument is outside the domain over which the mathematical
function is defined.
Thus, F.9.1.4 should read
-- atan2(+/-0, x) returns +/-0, for x>0.
-- atan2(+/-0, +/-0) returns NaN.
-- atan2(+/-0, x) returns +/-pi, for x<0.
-- atan2(y, +/-0) returns pi/2 for y>0.
-- atan2(y, +/-0) returns -pi/2 for y<0.
-- atan2(+/-y, INF) returns +/-0, for finite y>0.
-- atan2(+/-INF, x) returns +/-pi/2, for finite x.
-- atan2(+/-y, -INF) returns +/-pi, for finite y>0.
-- atan2(+/-INF, +/-INF) returns NaN.
-- atan2(y, NaN) returns NaN for any y.
-- atan2(NaN, x) returns NaN for any x.
and F.9.4.4 should read
[#1]
-- pow(NaN, x) returns NaN for any x, even 0
-- pow(x, NaN) returns NaN for any x
-- pow(+/-0, +/-0) returns NaN
-- pow(x, +INF) returns +INF for x>1.
-- pow(1., +/-INF) return NaN
-- pow(x, +INF) returns +0 for 0<x<1.
-- pow(x, -INF) returns +0 for x>1.
-- pow(x, -INF) returns +INF for 0<x<1.
-- pow(x, +/-INF) returns NaN for x<0.
-- pow(+INF, y) returns +INF for y>0.
-- pow(+INF, +/-0) returns NaN.
-- pow(+INF, y) returns +0 for y<0.
-- pow(-INF, y) returns NaN for any y.
I know that the committee has rebuffed earlier suggestions of this sort,
but I hope that the committee will reconsider, since the laws of
mathematics *must* take precedence over mere software standards.
There is also the common-sense rule that when you don't know the answer,
the only responsible reply is "I don't know" (that is, NaN) rather than
making up an answer, however plausible.
Comment 5.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.12.7.4, 7.12.1
Title: pow domain errors
Detailed description:
> [#2] The pow functions compute x raised to the power y. A
> domain error occurs if x is negative and y is finite and not
> an integer value. A domain error occurs if the result
> cannot be represented when x is zero and y is less than or
> equal to zero. A range error may occur.
When x and y are both zero, the result is mathematically undefined, so,
by 7.12.1, a domain error is *required*. Thus, I would recommend
changing 7.12.7.4 paragraph 2 to:
[#2] The pow functions compute x raised to the power y. A
domain error occurs if x is negative and y is finite and not
an integer value. A domain error occurs if x and y are
both zero, or if the result cannot be represented when x is
zero and y is less than zero. A range error may occur.
Comment 6.
Category: Request for information/clarification
Committee Draft subsection: 5.2.4.2.2, Annex F
Title: Rounding behavior in case of ties
Detailed description:
In paragraph 6, rounding is discussed. However, the behavior of
round-to-nearest in case of ties has not been specified. Should it be
stated that this is implementation-specified? Annex F refers to IEC
60559, but does not quote from it, so one is left wondering what is to
be done in case of ties. I think that the C standard should be
reasonably self-contained, and a brief mention of rounding in case of
ties would do the trick.
Comment 7.
Category: Request for information/clarification
Committee Draft subsection: 5.2.4.2.2, 6.5, 7.6
Title: "Exception" inadequately defined
Detailed description:
In subsection 5.2.4.2.2 paragraph #3 the term "exception" is used for
the first time, without definition or forward reference. Is this a
"floating-point exception"? Subsection 6.5, paragraph #5 only confuses
things further:
[#5] If an exception occurs during the evaluation of an
expression (that is, if the result is not mathematically
defined or not in the range of representable values for its
type), the behavior is undefined.
since in 7.6 some attempt is made to define the behavior for
floating-point exceptions. A clarification of these terms is needed.
Comment 8.
Category: Request for information/clarification
Committee Draft subsection: 7.12.1
Title: Backward compatibility issue with floating-point exceptions
Detailed description:
Programs written for ISO 9899-1990 do not have access to the
floating-point exception mechanism. It was safe to pass any argument to
the math functions and test the results later, since, according to 7.5.1
of that document,
"Each function shall execute as if it were a single operation,
without generating any externally visible exceptions."
However, that sentence has disappeared from FCD 9899, which raises
compatibility issues. It's OK to have an exception mechanism, but there
needs to be a guarantee that programs written for ISO 9899-1990 will not
crash because of an exception which is now externally visible (and
unhandled).
Comment 9.
Category: Request for information/clarification
Committee Draft subsection: 7.12.1
Title: Domain and range errors
Detailed description:
In paragraphs #2 and #3, the requirement was dropped that errno be set
to EDOM or ERANGE to reflect a domain or range error. I am curious as
to what use there is in defining domain and range errors, since it
appears that there is no longer any specified means for a program to
determine whether such an error has occurred.
This omission creates compatibility problems for programs conforming to
ISO 9899-1990, since they may rely on errno to detect problems occurring
during evaluation of math functions.
Lawrence J. Jones
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.19.4.4
Title: tmpnam
Detailed description:
tmpnam is required to return at least TMP_MAX different names that are
not the same as the names of any exiting files, but there is no way to
ensure that every possible name isn't already in use by existing files.
Since there is no license for tmpnam to fail until it has been called
more than TMP_MAX times (and, since the behavior in that case is
implementation-defined, it may not even be valid then), it seems that
tmpnam must not return in this case. This is not very useful behavior.
I suggest the following changes:
1) TMP_MAX should be described as the number of possible names that
tmpnam can generate (note that any or all of them might match
existing file names).
2) tmpnam should be allowed to return a null pointer if no acceptable
name can be generated.
3) Calling tmpnam more than TMP_MAX times should not produce
implementation-defined behavior. Rather, it should be
implementation-defined (or unspecified) whether tmpnam simply fails
unconditionally after exhausing all possible names or resets in an
attempt to re-use previously generated names.
------------------------------------------------------------------------
Comment 2.
Category: Request for information/clarification
Committee Draft subsection: 3.18, 6.2.4, 6.2.6, 6.7.8, others
Title: Unspecified and indeterminate values
Detailed description:
The draft refers to both "unspecified" and "indeterminate" values. It
is not clear to me whether these are intended to be synonyms, or if
there is some subtle difference between them. If there is a difference,
my guess is that indeterminate values are allowed to be trap
representations but unspecified values are not. In either case, the
draft should be clarified.
------------------------------------------------------------------------
Comment 3.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.2.5, 6.3.1.1, 6.7.2.2
Title: Enumerated type compatibility
Detailed description:
According to 6.7.2.2, "each enumerated type shall be compatible with an
integer type", but this is vacuously true since 6.2.5p17 says that
enumerated types *are* integer types. I believe that 6.7.2.2 meant to
say that each enumerated type shall be compatible with a signed integer
type or an unsigned integer type (or, perhaps, with char). If a change
is made to 6.7.2.2, a similar change needs to be made to 6.3.1.1.
------------------------------------------------------------------------
Comment 4.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.3.2.3
Title: Conversion of null pointer constants
Detailed description:
6.3.2.3p3 states that assignment and comparison cause null pointer
constants to be converted to null pointers, implying that conversions in
other contexts do not and thus result in implementation-defined results
as per p5. I believe the intent was that any conversion of a null
pointer constant to a pointer type should result in a null pointer, with
assignment and comparison being but examples.
------------------------------------------------------------------------
Comment 5.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.4
Title: UCNs as preprocessing tokens
Detailed description:
Since the syntax and constraints allow any UCN to appear in an
identifier, I believe the syntax rule allowing a UCN to be a
preprocessing token is vacuous and should be removed.
====================
David H. Thornley
Comment 1.
Category: Feature that should be removed
Committee Draft subsection: 7.19.7.7
Title: Declare gets() obsolescent
Detailed description: The gets() function can very rarely be used
safely, and is the source of numerous problems, including the
Great Internet Worm of 1987. It should be declared obsolescent.
Unfortunately, it is widely used, and cannot be simply removed.
------------------------------------------------------------------
Comment 2.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.18
Title: Renaming of types in <stdint.h>
Detailed description: Traditionally, C integer types were assigned
similarly to the *fast* designations: int has been a fast type
of at least 16 bits, while long has been a fast type of at least
32 bits. It is desirable to keep this principle, and so it is
desirable to ensure that the shortest, easiest-to-use integer type
names are fast types. Further, the shortest names are going to be
most used, except among very careful programmers, and are going to
be troublesome if assigned to exact-width types, which may or may
not exist.
The fix:
Rename the types as follows:
int_*t becomes int_exact*t
int_least*t stays the same
int_fast*t becomes int_*t
------------------------------------------------------------------
Comment 3.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.18.1.1
Title: The meaning of "exact" in <stdint.h>
Detailed description:
The meaning of "exact" in 7.18.1.1 is unclear. A computer with 36-bit
words could define a 36-bit integer type as int_32t, since the only
way to tell the difference would be to cause overflow, which is undefined.
Since it is impossible to tell the difference without causing undefined
behavior, it is difficult to see what portable use the int_32t would be.> The main apparent use for exact types is communicating with I/O
devices and programs, and this would be compromised if the types were
inexact.
The fix:
Exact types may not have padding bits.
(Alternately, exact signed types may not have padding bits. The same
difficulty does not exist with exact unsigned types, since the overflow
behavior of unsigned types is well defined.)
----------------------------------------------------------------
Comment 4.
Category: Feature that should be removed
Committee Draft subsection: 6.2.4#4
Title: Remove the long long integer types
Detailed description:
The increased size of computer systems has created a problem in that
many object sizes cannot be expressed with only 32-bit integers.
On the other hand, many programs existing use "long" as meaning,
specifically, 32-bit integer. There are also an unknown number of
programs that rely on "long" as the longest integer type. There is
no good solution for this, so we must find the least bad one.
In the previous C standard, there were four integer types, with
clear meanings. "char" was a type capable of holding a single
character. "short" was a space-conserving type capable of holding
an integer of frequently useful size. "int" was a general-purpose
integer type, being some sort of natural size (but at least 16 bits),
and "long" was the longest integer type. It is now proposed to remove
the meaning of "long" by adding "long long". This leaves the meaning
of "long" as something like int_fast32t, and "long long" as int_fast64t.
The addition of "long long" therefore destroys meaning in favor of
introducing redundancy.
The primary reason for not allowing "long long" as a standard integer
type is that it introduces a serious incompatibility between the previous
C standard and the proposed on. In the previous C standard, it was
guaranteed that any type based on a standard integer (such as size_t)
could be converted into a string by casting it to long or unsigned long
and using sprintf() or similar function. DR 067 essentially said that
this was standard and portable. With the proposed standard, it is
no longer portable, since size_t and other such types can be longer than
long. (POSIX defines some such types as being standard C integer
or arithmetic types, and these are likely more important than size_t
and ptrdiff_t.)
The difference between this change and all others proposed is that it
produces a fundamental discrepancy between the previous C standard and
the proposed one. Certainly, the conversion of size_t or similar
types to character strings is a reasonable thing to do. There is a
way to do it in C89, and an entirely different way to do it in the
proposed standard. There is no way to write a program doing such a
conversion that will run in both versions of C, except by using the
preprocessor for conditional compilation.
I do not know how many programs will be affected, or in what way. In
a recent posting to comp.std.c, message ID
<6vfcd6$7ut$1@pegasus.csx.cam.ac.uk>, Nick Maclaren described an
experiment
he had performed:
"I took a copy of gcc and hacked it around enough to produce diagnostics
for some of the problem cases, where C9X introduces a quiet change over
C89 in the area of 'long' and 'long long'. However, this hack has the
following properties:
1) It flags only some traps.
2) It produces a large number of false positives.
3) It requires header hacks, and produces broken code.
"I then ran it on a range of widely-used and important public-domain
codes, taken from the Debian 1.3.1 CD-ROM set. Many of these are
effectively the same codes that are shipped with commercial systems,
and others are relied on heavily by many sites.
"Most of the codes used "long" to hold object and file positions, or as a
way of printing an unknown integer type. The ones that I have marked as
"Yes" will almost certainly invoke undefined behaviour if faced with a
C9X compiler where ptrdiff_t is longer than "long", and probably will if
off_t is. The ones that I have marked "Maybe" could well have checks to
prevent this, or were too spaghettified to investigate.
"Only 4 had any reference to "long long" whatsoever, and it was in a
single non-default #if'd out section in 3 of them; one of those defined
a symbol that was never referred to, another was solely for Irix 6 file
positions, and the last could trivially have been replaced by double.
The ONLY program that either had any reference to "long long" by
default, or used it seriously, was gcc itself."
Loss of data printf fails Uses long long
------------ ------------ --------------
apache Yes Yes No
bison No No No
bash Maybe Yes No
cpio Yes No Effectively not
csh Yes No No
diff Maybe No No
elm Build process failed No
exim Yes No No
fileutils Yes No Effectively not
findutils Yes Yes No
flex No No No
gawk Yes Yes No
gcc Build process failed Yes
gnuplot Maybe No No
gzip Yes No No
icon Yes No No
inn Build process failed No
nvi Maybe Yes No
pari Maybe No No
perl Build process failed Effectively not
sendmail Yes Yes For Irix 6
trn Maybe No No
wu-ftpd No Yes No
zip Yes Yes No
The problems will show up only when dealing with sufficiently long data
objects, but I see no reason why any of those programs should not
eventually be applied to a file of more than four gigabytes. If so, the
program wil fail in odd ways, likely corrupting data. Assuming that the
programs selected are representative, somewhere between one-third and
one-half of all large, heavily-used C programs are likely to mishandle
large files or memory objects. Since many of these programs are not
commercially supported, the task of changing them to be valid C will fall
to volunteers. Even with commercial software, the difficulties involved
in sorting out all possible problems mean that these programs will be
untrustworthy.
It is also desirable to keep fseek() and ftell() usable as is, rather
than creating more functions, changing the return types of the current
ones, or going to the less useful fgetpos() and fsetpos(). Also,
pointers were guaranteed to fit in some integral type in the previous
standard, and many programs may have taken advantage of that.
Since the proposed "long long" type creates an unbridgeable discrepancy
between C89 and C9X, and since it renders a large and unknown number of
programs untrustworthy in unknown but probably dangerous ways, I think it
a very bad idea.
It is not necessary to require that "long" become a 64-bit type, but
merely to require that all appropriate "_t" types be representable
as long or unsigned long. I see no obvious need for a 64-bit type
unless required by these types, but there is certainly no reason
not to have one. It is possible to require a conforming implementation
to have "int_fast64t", "int_least64t", and the corresponding unsigned
types. It is possible to list "long long" as a common extension, in
which case it does no harm but the aesthetic.
Let us consider how programs will be affected by the changes.
First, there is no need for long to exceed 32 bits unless object size
may exceed that. Many current systems will have no problem with it.
Second, new programs written can use int_least32t in place of long,
and int_fast64t in place of "long long". This will provide compatibility
with existing ABIs using "long" and "long long" types.
Third, compiler manufacturers will undoubtedly provide various ways to
bridge the gap. For a long time to come, compilers will doubtless
provide options to restrict "long" to 32 bits, for use in compiling
older programs.
There is a class of programs that will be seriously affected: those that
need "long" to be 32 bits, and need "size_t" or "off_t" or some such to
be more than 32 bits. Many of these programs are therefore irretrievably
broken. Some of them have been written with "long long", and such
types as size_t being "unsigned long long". The authors of these
programs have already demonstrated a willingness to discard portability,
in that the programs will only work on certain nonstandard
implementations.
I hope that they have been careful to record what uses of "long" should
not exceed 32 bits and what uses should be the longest possible type.
I believe this class of programs is much smaller than the class that
would be adversely affected by "long long"; most of the programs
surveyed above do not in fact use "long long", while a frightening
number will suffer quiet breakage if "long long" is a standard C
integer type.
Fix:
Remove "long long" and "unsigned long long" from the list of standard
C integer types. It may be listed as a common extension, and <stdint.h>
may require int_fast64t and int_least64t if desired.
------------------------------------------------------------------
Comment 5.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.19.6.1, 7.19.6.2, 7.24.2.1, 7.24.2.2
Title: New type declarators for printf() and scanf()
Detailed description:
The conversion specifier characters in the format string listed in
<inttypes.h> are clumsy, awkward, and difficult to remember. I propose
a somewhat different syntax for them, substituting involving length
modifiers rather than conversion specifier characters.
We need to add specifiers for int_*t, int_fast*t, int_least*t, and
intmax_t, as well as their unsigned counterparts. We can use a
notation with, for example, :32f as a length modifier for int_fast32t
types. This has the further advantage that it limits the need for
using more letters. Note that this is not an actual parameterized
implementation, although it could certainly be made so in a later
version of the standard.
The fix:
Add to the end of 7.19.6.1 [#7]
: Specifies an extended integer type. It is followed by a number
and letter, or by a string of letters. If followed by a number
and letter, the number is the number of bits in the type, and
the letter is x for an exact type, f for a fast type, and l for
a least type.
The number and letter shall match an existing extended integer
type in stdint.h. Alternately, the colon shall be followed by
"max", "size", or "ptr", specifying a length of intmax_t, size_t,
or ptrdiff_t respectively.
Add to the end of 7.19.6.2 [#11]
: Specifies a pointer to an extended integer type. It is followed
by a number and letter, or by a string of letters. If followed
by a and letter, the number is the number of bits in the type,
and the letter is x for an exact type, f for a fast type, and
l for a least type. The number and letter shall match an
existing extended integer type in stdint.h. Alternately,
the colon shall be followed by "max", "size", or "ptr",
specifying a length of intmax_t, size_t, or ptrdiff_t
respectively.
The same should be appended to 7.24.2.1 and 7.24.2.2 respectively.
------------------------------------------
Robert Corbett
Comment 1.
Category: Request for information/clarification
Committee Draft subsections: F.9, G.5
Title: the sign convention should be explained
Detailed description:
The sign convention used in Sections F.9 and G.5 should be
described explicitly. For example, I assume that
asin(±0) returns ±0
means
asin(-0) returns -0
and
asin(+0) returns +0,
but I could imagine someone thinking that either -0 or +0
could be returned for asin(-0) or asin(+0).
=============================================================
Comment 2.
Category: Normative change to intent of existing feature
Committee Draft subsection: F.9.4.4
Title: pow(+1, ±inf) should return +1
Detailed description:
The function call pow(1, ±inf) should return 1 and should not
raise any exceptions. The mathematical basis for changing this
special case is stronger than the basis for defining pow(x, ±0)
to be 1. This special case is less important than the
pow(x, ±0) case, but it is useful for essentially the same
reasons.
=============================================================
Comment 3.
Category: Normative change to intent of existing feature
Committee Draft subsection: G.5, paragraph 7
Title: cpow(0, z) should not always raise exceptions
Detailed description:
The FCD defines cpow(z, c) to be equivalent to cexp(c clog(z)).
No special cases are given in Section G.5.4. Since clog(0+i0)
raises the divide-by-zero exception, cpow(0+i0, z) must raise
the divide-by-zero exception regardless of the value of z. In
the case where z = x+i0, cpow is required to raise both the
divide-by-zero exception and the invalid exception. There is
no reasonable basis for raising these exceptions.
Recommendation:
The function cpow(0+i0, z) should not raise the divide-by-zero
exception unless the result is an infinity. It should not raise
the invalid exception unless the result is a NaN.
=============================================================
Comment 4.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.5.3.4, paragraph 2
Title: the operand of sizeof should not be a VLA
Detailed description:
One of the properties of C that makes it suitable for writing
systems software is that all operations are explicit. The
sizeof operation as defined for operands that are VLAs
introduces implicit operations for the first time. The
implicit operations are particularly problematic if the bounds
expression has side effects. It is not clear from the current
draft if the bounds expression will be re-evaluated when the
sizeof operator is evaluated.
Recommendation:
Add a constraint prohibiting the operand of sizeof from being
a VLA or a VLA type.
=============================================================
Comment 5.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.2.2
Title: there should explicitly be no linkage between different identifiers
Detailed description:
Many programmers believe that
#include <stdio.h>
const int i = 0;
const int j = 0;
int
void main(void)
{
printf("%d\n", &i == &j);
return (0);
}
is allowed to print 1 (followed by a newline). The standard does
not explicitly state that no linkage exists between the objects
designated by different identifiers. While the intent of the
committee is clear to me, I find it hard to convince others based
on the current text of the standard. If this issue is not
addressed, I suspect a defect report will be needed to resolve it.
Recommendation:
The standard should explicitly state that no linkage exists
between different identifiers.
=============================================================
Comment 6.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.4.4.2
Title: conversion of floating-point constants
Detailed description:
The first edition of the C standard does not require
translation-time conversion of floating-point constants. The
current draft does not quite manage to state that floating-point
constants must be converted at translation time, but I believe
that the committe intended it to so stipulate.
I know of one C compiler that provides an option to have the
values of floating-point constants depend on the rounding mode
in effect at the time the containing expression is evaluated.
I know of another compiler that raises overflow or underflow
exceptions at execution time if the value of a floating-point
constant overflowed or underflowed when it was converted at
translation time. These compilers are products of major systems
vendors (not Sun in either case).
Recommendation:
The standard should explicitly state that floating-point
constants must be converted at translation time. It should
state that decimal floating-point constants of the same type
and mathematical value must produce the same value in all
contexts. Similarly, it should state that hexadecimal
floating-point constants of the same type and mathematical
value must produce the same value in all contexts. The normative
Section F.7.2 should state that evaluation of floating-point constants is
not allowed to raise execution-time exceptions.
John Hauser
Comment 1.
Category: Inconsistency
Committee Draft subsection: 7.12.6.4 and F.9.3.4
Title: frexp infinity result not allowed
Detailed description:
Problem:
Section 7.12.6.4 specifies that frexp returns a value with ``magnitude
in the interval [1/2, 1) or zero'', but Section F.9.3.4 insists that
frexp(x,p) returns x if x is an infinity.
Fix:
In Section 7.12.6.4, allow frexp to return infinity if its first
argument is infinite.
---------------------------------------------
Comment 2.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: F.9.3.10
Title: log2(1) should equal log(1) and log10(1)
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
Sections F.9.3.7 and F.9.3.8 mandate that log(1) and log10(1) each
return +0, while Section F.9.3.10 omits this requirement for log2(1).
There is no reason for base-2 logarithms to differ from base-10 and
natural logarithms in this detail.
Fix:
Add to Section F.9.3.10 the rule:
-- log2(1) returns +0.
------------------------------------------------------------------
Comment 3.
Category: Inconsistency
Committee Draft subsection: 7.12.7.4 and F.9.4.4
Title: Inconsistent pow domain errors
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
Section 7.12.7.4 states for pow(x,y):
A domain error occurs if x is negative and y is finite and not an
integer value.
Under the IEEE Standard (IEC 60559), a domain error is signaled by
raising the invalid exception flag, with a NaN returned from the
function if the result type is floating-point. But Section F.9.4.4
defines
pow(-infinity,y) -> +0 y < 0 and y not an integer
pow(-infinity,y) -> +infinity y > 0 and y not an integer
and does not permit these cases to raise the invalid exception flag,
even though 7.12.7.4 calls them domain errors.
In a slightly different vein, Section 7.12.7.4 also says that
A domain error occurs if the result cannot be represented when x is
zero and y is less than or equal to zero.
whereas F.9.4.4 defines
pow(+0,+0) = pow(+0,-0) = pow(-0,+0) = pow(-0,-0) -> 1
The definition in Section F.9.4.4 strongly implies that pow(0,0) should
return 1 in _every_ implementation, because nothing ever prevents
it. Since a result of 1 can certainly always ``be represented'', the
wording in Section 7.12.7.4 is incongruous with Annex F for the case
pow(0,0).
Suggested fix:
Adjust the function description in Section 7.12.7.4 to say:
A domain error occurs if x is finite and negative and y is finite
and not an integer value. A domain error occurs if the result
cannot be represented when x is zero and y is less than zero.
------------------------------------------------------------------
Comment 4.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: F.9.5.3
Title: lgamma(1) and lgamma(2) should return +0
Detailed description:
(All numbers in this comment are floating-point.)
Problem:
When a floating-point function returns a zero result and the sign
of this zero result is indeterminate (neither +0 nor -0 is a more
legitimate result than the other), Annex F usually mandates that the
function return +0 and not -0. For example, Annex F specifies that
acos(1), acosh(1), log(1), and log10(1) each return +0. This policy
is consistent with the IEEE Standard's (IEC 60559's) requirement
that x-x always return +0 and not -0 (assuming finite x and the usual
round-to-nearest rounding mode), even though the sign on the zero
result in this case is wholy indeterminate for any x.
Section F.9.5.3 neglects to require that lgamma(1) and lgamma(2) each
return +0, consistent with the other such cases.
Fix:
Add to Section F.9.5.3 the rules:
-- lgamma(1) returns +0.
-- lgamma(2) returns +0.
------------------------------------------------------
Thomas MacDonald
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.8.5.3 The for statement
Title: definition of the for statement
Detailed description:
The current description of the for statement is terms of a syntactic
rewrite into a while loop. This causes problems as demonstrated by the
following example:
enum {a,b};
for(;;i+=sizeof(enum{b,a})) j+=b;
is not equal to
enum {a,b};
while(){j+=b; i+=sizeof(enum{b,a});}
because a different b is added to j.
Change the description of the for statement to:
6.8.5.3 The for statement
The following statement
for ( clause-1; expression-2; expression-3 ) statement
behaves as follows. The expression expression-2 is the controlling
expression. If clause-1 is an expression it is evaluated as a void
expression before the first evaluation of the controlling expression. The
evaluation of the controlling expression takes place before each execution
of the loop body. The evaluation of expression-3 takes place after each
execution of the loop body as a void expression. Both clause-1 and
expression-3 can be omitted. An omitted expression-2 is replaced by a
nonzero constant.
Also, delete the "forward reference" to the continue statement.
----------------------------------------------------------------
Comment 2.
Category: Normative change to intent of existing feature
Committee Draft subsection: 5.2.4.1, 6.2.1, 6.8, 6.8.2, 6.8.4, 6.8.5
Title: define selection and iteration statements to be blocks.
Detailed description:
A common coding convention is to allow { } for all iteration and selection
statements. When compound literals where introduced, this convention was
inadvertently compromised. Consider the following example:
struct tag { int m1, m2; } *p;
while (flag)
flag = func( p = & (struct tag) { 1, 2 } );
p->m1++;
If { } are introduced as follows:
while (flag) {
flag = func( p = & (struct tag) { 1, 2 } );
}
p->m1++; // Error - compound literal went out of scope
then the example has undefined behavior because the compound literal goes
out of scope after the while loop finishes execution.
The recommended change is:
5.2.4.1 (page 17) Translation limits: 127 nesting levels of blocks
6.2.1 (page 25) scopes of identifiers: ... the identifier
has "block scope," which terminates at the end of the
associated block.
6.5.2.5 (page 67-68) change example 8
6.8 (page 119) statements: A "block" allows a set ...
(move 6.8.2 semantics to 6.8)
6.8.2 (page 120) Compound statement: Semantics: change semantics to:
A "compound statement" is a block.
6.8.4 (page 121) Selection statements: (add to semantics)
A selection statement is a block whose scope is a strict
subset of its enclosing block. All associated substatements
are blocks whose scopes are strict subsets of the
associated selection statement block.
6.8.5 (page 123) iteration statements: (add to semantics)
An iteration statement is a block whose scope is a strict
subset of its enclosing block. The associated loop body is
a block whose scope is a strict subset of the associated
iteration statement block.
Also, forward references need to be checked.
-----------------------------------------------------------------------
Comment 3.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.5.2
Title: VLA side-effects
Detailed description:
Currently 6.7.5.2 states that it is unspecified whether side effects are
produced when the size expression of a VLA declarator is evaluated.
Consider the following complicated sizeof expression:
void *p;
sizeof( ** (int (* (*) [++m]) [++n]) p )
In this example, the sizeof operand is an expression. The pointer "p" is
cast to a complicated type:
pointer to array of pointers to array of int
which can be shown in a flattened out view (read R to L)
int [++n] * [++m] *
The sizeof operand is an expression containing two deref operators. Once
these deref operators are applied, the final type of the expression is:
int [++n] *
The translator does not need to evaluate the "++n" size expression to
determine the pointer size. However, it's not difficult for the
translator to do so anyway.
It's a common implementation technique to ignore unused portions of the
expression tree when sizeof is involved. One way to view the sizeof
expression is the following parse tree:
sizeof (TYPE: size_t)
|
* (TYPE: int [++n] *)
below not needed --> |
so discard? * (TYPE: int [++n] * [++m])
|
(cast) (TYPE: int [++n] * [++m] *)
|
p (TYPE: void *)
When "sizeof" is encountered, the translator need not look any further
than its immediate operand to determine the size. When looking at the
immediate operand of "sizeof" the only side effect noticed is "++n" and
this means the translator must look all the way down into the type
"int [++n] *" to find the side-effect (which is straightforward).
If C9X requires implementations to evaluate "++m" also, then it gets
harder because it's not always obvious where to find the "++m"
expression. More elaborate machinery is needed to find the unevaluated
VLA expressions. If a translator is designed from scratch, this can be
built in. However, most of us live with existing technology.
The compromise reached by WG14 here, is that we do not require side
effects to be evaluated inside a VLA size expression. Some have objected
to that compromise. The presence of mixed code and decls eliminates more
of the distinctions between declarations and statements. They can appear
in the same places now. The critics of the compromise have more ammo
here. The "no required side effects" clause is more noticeable with mixed
code & decls.
There are other places where an evaluation is not strictly needed:
void f(int n; int a[++n]) { // "a" is converted to a pointer anyway }
int (*p)[n++]; // don't need array size unless doing bounds checking
I'm sure there are others. In general, if a VLA declarator can be written
as an incomplete type, as in:
void f(int n; int a[]) { }
int (*p)[];
the side effect might not be evaluated by the translator (because the size
isn't needed). For an example such as:
int (*p)[++n] = malloc(size);
p++;
The intent is that the side effect be evaluated since the value is needed
to increment the pointer p. However, if all references to p are of the
form:
(*p)[i]
then the value produced by the side effect is never needed.
The wording is tricky, because the side effects are only
evaluated when the size is needed.
-----------------------------------------------------------------------
Comment 4.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.3
Title: property M for restricted pointers
Detailed description:
Summary
-------
A 'pure extension' to the current specification of the 'restrict'
qualifier is proposed. It would make some additional, natural
usages of the qualifier conforming without imposing an additional
burden on compilers. This proposal was developed in response to
PC-UK0118, and, possibly after further refinement, is intended to
become part of a public comment.
The problem with the current specification
------------------------------------------
The issue addressed here was raised in part 2) of PC-UK0118 and
concerns the use of two different restricted pointers to reference
the same object when the value of that object is not modified (by>
any means). The same issue can arise for a library function with
one or more restrict-qualified parameters when it is called with
string literal arguments.
Example A:
typedef struct { int n; double * restrict v; } vector;
void addA(vector a, vector b, vector c)
{
int i;
for(i=0; i<a.n; i++)
a.v[i] = b.v[i] + c.v[i];
}
Allowing calls of the form addA(x,y,y) would not inhibit
optimization of the loop in 'addA', but the current specification
makes the behavior of such calls undefined.
Example B:
printf("%s", "s");
An implementation should be able to "pool" the string literals
literals so that "s" is (or, more accurately, points to) a
subobject of (the object pointed to by) "%s", but with the current
specification of restrict, the fact that the first parameter of
printf is restrict-qualified appears to prohibit such pooling.
The problem with the specification proposed in PC-UK0118
--------------------------------------------------------
Recall that PC-UK0118 proposed that second sentence of paragraph 4
in 6.7.3.1 should be changed to:
Then either all references to values of A shall be through
pointer expressions based on P, or no reference to A
(through expressions based on P or otherwise) shall modify
any part of its contents during the execution of B.
This has the intended effect for examples A and B above, but
consider:
Example C:
typedef struct { int n; double * restrict v; } vector;
void addC(vector * restrict x, vector * restrict y)
{
int i;
for(i=1; i<x->n; i++)
x->v[i] += y->v[i-1];
}
There is a problem for a call of the form addC(&z,&z), which the
current specification gives undefined behavior. The PC-UK0118
proposal appears to give this call defined behavior, and so renders
the restrict qualifiers on the parameters ineffective in promoting
optimization of the loop. In particular, because z.v is not
modified, it can be referenced as both x->v and y->v within addC.
It follows that although z.v[i] is modified, it is referenced
through only one restricted pointer object, z.v (designated first
as x->v and then as y->v within addC). Thus there is no undefined
behavior, and so optimization of the loop in addC is inhibited by
the possibility that x->v[i] and y->v[i] refer to the same object.
Consider also an example motivated by the semantics of Fortran
dummy arguments:
Example D:
void addD(int n, int * restrict x, int * restrict y)
{
int i;
for(i=0; i<n; i++) {
x[i] += x[i+n];
y[i+n] += y[i];
}
}
For a call of the form addD(100,z,z+100), the last half of the
array referenced through x overlaps the first half of the array
referenced through y. But since the overlapping halves are not
modified, there no optimization benefit from giving the call
undefined behavior, as both the current specification and the
PC-UK0118 proposal seem to do.
The new proposal
----------------
If we paraphrase the PC-UK0118 formulation as:
If a "direct target" of a restricted pointer is
modified, that target cannot be aliased (referenced
through pointers not based on the restricted pointer).
then the formulation proposed below can be paraphrased as:
If a "direct or indirect target" of a restricted
pointer is modified, the direct target cannot be
aliased.
It follows that if a reference involves a chain of pointers, all
the pointers in the chain are restricted, and the final target
object is modified, then there should be no aliasing of the targets
of any of the pointers in that chain. This allows aliasing of an
unmodified object, but not of a restricted pointer used indirectly
to reference a modified object, even if its direct target is
unmodified.
Changes proposed for 6.7.3.1 (marked with | in left margin)
-----------------------------------------------------------
=== Change paragraph #4 to read:
[#4] During each execution of B, let A be the array object
that is determined dynamically by all references through
pointer expressions based on P.
| Let A1 denote the smallest array subobject of A that
| contains all elements of A that have property M, defined
| as follows. An object X has property M if either
| 1) X is modified during during the execution of B, or
| 2) X, or a subobject of X, is a restrict-qualified
| pointer on which &E is based, for some lvalue E
| used during the execution of B to designate an
| object that has property M.
Then all references to values of A shall be through pointer
| ^
| A1
expressions based on P.
=== Change example 1 to read:
1. The file scope declarations
int * restrict a;
int * restrict b;
extern int c[];
assert that if an object is referenced using the value of one
of a, b, or c,
| ^
| and that object is modified anywhere in the program,
then it is never referenced using the value of either of the
other two.
=== Change example 3 to read:
3. The function parameter declarations
| void h(int n, int * restrict p,
| int * restrict q, int * restrict r)
{
int i;
for (i = 0; i < n; i++)
p[i] = q[i] + r[i];
}
| illustrate how an unmodified object can be aliased through two
| restricted pointers. In particular, if a and b are disjoint
| arrays, a call of the form h(100, a, b, b) has defined behavior,
| because array b is not modified within function h.
Effect of changes on the examples
---------------------------------
For Example A, the call addA(x,y,y) has defined behavior from the
changes, as desired. In particular, because b.v[i] is not modified
during such a call, there is no requirement that all references to
it must be based on b.v. Thus the reference c.v[i] is also
allowed. The same reasoning applies with the roles of b and c
interchanged.
For Example B, there is no conflict between the restrict qualifier
and literal pooling, because the literals are not modifiable.
For Example C, a call of the form addC(&v,&v) has undefined
behavior, as desired. The analysis is follows: Because x->v[i] is
modified, and &(x->v[i]), or (x->v)+i, is based on the restricted
pointer x->v, which is a subobject of *x, it follows that *x has
property M. Therefore all references to *x must be through a
pointer based on x. In particular, y is not based on x, and so *y
must not be the same object as *x, as it would be for the call
addC(&v, &v).
For Example D, the call of the form addD(100,z,z+100) has defined
behavior, as desired, because although the arrays referenced
through the two restricted pointer parameters overlap, the elements
in the overlap are not modified.
Conclusion
----------
Compared to either the current specification or the PC-UK0118
proposal, the changes proposed here do a better job of specifying
undefined behavior only where there is a real opportunity for
better optimization. They admittedly result in a more complicated
specification, but they do not impose an additional burden on an
implementation.
-----------------------------------------------------------------------
Comment 5.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.3
Title: allow restricted pointers to point to multiple objects
Detailed description:
Currently, restricted pointers can only point to a single object
(dynamically determined) during their lifetime. Recently the committee
changed C9X such that "realloc" always allocates a new object. This means
that using "realloc" with restricted pointers is no longer permitted.
int * restrict p = malloc(init_size);
...
p = realloc(p, new_size); // error - new object
Change the specification of "restrict" to allow restricted pointers
to point to multiple objects during their lifetime.
-----------------------------------------------------------------------
Comment 6.
Category: Normative change to intent of existing feature
Committee Draft subsection: 6.7.3
Title: allow type qualifiers inside [] for params.
Detailed description:
One impediment to effective usage of restricted pointers is that they are
not permitted in array parameter declarations such as:
void func(int n, int a[restrict n][n]);
Instead, the user must write the declaration as:
void func(int n, int (* restrict a)[n]);
this form of declaration is far more cryptic and, therefore, harder to
understand. The recommendation is that all type qualifiers be allowed in
the ``top type'' of a parameter declared with an array type.
---------
Peter Seebach
Comment 1.
Category: Feature that should be included
Committee Draft subsection: <string.h>
Title: strsep() function
Detailed description: ...
The strsep() function was proposed for the standard. It was not
voted in because there was a lack of general interest at the meeting
in question. However, it's a trivial feature, and a very useful one,
and I'd like to see it added. Complete thread-safe implementation
follows:
/*
* Get next token from string *stringp, where tokens are possibly-empty
* strings separated by characters from delim.
*
* Writes NULs into the string at *stringp to end tokens.
* delim need not remain constant from call to call.
* On return, *stringp points past the last NUL written (if there might
* be further tokens), or is NULL (if there are definitely no more
tokens).
*
* If *stringp is NULL, strsep returns NULL.
*/
char *
strsep(stringp, delim)
register char **stringp;
register const char *delim;
{
register char *s;
register const char *spanp;
register int c, sc;
char *tok;
if ((s = *stringp) == NULL)
return (NULL);
for (tok = s;;) {
c = *s++;
spanp = delim;
do {
if ((sc = *spanp++) == c) {
if (c == 0)
s = NULL;
else
s[-1] = 0;
*stringp = s;
return (tok);
}
} while (sc != 0);
}
/* NOTREACHED */
}
Comment 2.
Category: Inconsistency
Committee Draft subsection: several; primarily 3.x
Title: Indeterminate values and trap representations.
Detailed description:
It is unclear whether or not our intent is that an access to an
indeterminately valued object of character type invokes undefined
behavior. In C89, the only argument which allows an implementation to
abort upon access to an indeterminately valued object of any type is
the argument that, since the definition of undefined behavior
includes "access to indeterminately valued objects", that all such
access invokes undefined behavior. There is no other way to reach
that conclusion, but clearly it is the one we want - thus, we meant
to assert that all access to indeterminately valued objects yields
undefined behavior.
However, it has become clear that many people want a guarantee that
access through lvalues of character type cannot be undefined behavior.
This is for two reasons: struct padding, and memcmp. A discussion at
the Santa Cruz meeting came to the conclusion that we would like for
this to be permitted.
The material on "trap values" appears to provide adequate guidance to
when access to an object *may* yield undefined behavior. It may be
that simply removing the reference to indeterminately valued objects
from the definition of undefined behavior will fix this.
However, it is too complicated to suggest that any change would be
merely editorial.
Additionally, whatever our intent may have been, it seems fairly clear
(to me, anyway) that the current wording renders access to
indeterminately valued objects through lvalues of character type is
undefined behavior, and if we wish to change this, it is normative,
even if it's what we thought we meant all along.
Comment 3.
Category: Clarification
Committee Draft subsection: Section 7
Title: Modification of non-const qualified arguments by library
functions
Detailed description:
In several places, library functions take non-const-qualified pointers
as arguments, to indicate that the argument may be modified. It is
unclear whether or not this gives the implementor license to modify the
argument in all cases, or only as described in the semantics for a
function.
A particular example is
strtok("a", "b");
in this example, a careful reading of the description of strtok makes
it clear that there is no point at which the first argument is
modified. However, there are implementations which abort execution
when they reach this, because they *do* modify the first argument
and it's a string literal, so this introduces undefined behavior.
A discussion at Santa Cruz came to the conclusion that it's probably
intended that all library functions may "modify" any object they have
a non-const-qualified pointer to, as long as there are no visible
modifications not described in the text.
Wording for this ought to be put into the description of library
functions in general.
------------------
Andrew Josey
Comment 1.
Category: _ Feature that should be included
Committee Draft subsection: __7.19.8.2
Title: fwrite when size==0
Detailed description:
Comment on 7.19.8.2 The fwrite function #3
The C standard is inconsistent regarding fread() and fwrite() when
size==0. In fread(), there's an explicit sentence which specifies
that if either size or nmemb is 0 that the return value is zero. I
believe there should be an equivalent sentence in fwrite().
All UNIX implementations we know of (and the Single UNIX Specification)
return 0 for both functions when either size or nmemb is zero because
they essentially turn into calls to read() or write() of size*nmemb bytes.
The C standard was supposed to codify existing practice, and the
text does not match this here.
-----
Lawrence J. Jones
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.1.4
Title: Library macro requirements
Detailed description:
Library functions are allowed to be implemented as function-like macros,
but it is not entirely clear how closely such macros must match the
behavior of the actual function. In particular, it is clear that a
macro must be usable in the same contexts as a function call and accept
similar kinds of expressions as arguments, but it need not have the same
sequence points. What is not clear is whether it must accept exactly
the same argument types as the function; that is, whether it must
perform the same argument type checking and conversion that a call to a
prototyped function would.
I believe that reliable use of such macros requires a guarantee that
this argument type checking and conversion occur, and I suggest changing
the draft appropriately. Note that this has some impact on
implementations, but not a lot. The conversion is easily handled by a
judicious use of casts. Handling the type checking is not as obvious,
but is equally simple: at most, all that is required is to add a call to
the real function in a context where it will not be evaluated such as
the argument of sizeof (the value of which can then be discarded).
The committee should also carefully consider adding these requirements
for library macros where a complete prototype is given.
------------------------------------------------------------------------
Comment 2.
Category: Inconsistency
Committee Draft subsection: 7.19.6.2, 7.24.2.2
Title: Input failures
Detailed description:
Paragraph 15 of the description of fscanf talks about what happens when
end-of-file is encountered; the last part says that if valid input for
the current directive is immediately followed by end-of-file, "execution
of the following directive (other than %n, if any) is terminated with an
input failure." This is better than in C90 where the existence of %n
was ignored, but it is still incomplete: like %n, a white-space
directive also does not require any input and thus should not suffer an
input failure in this case.
As far as I can tell, the handling of end-of-file is specified
(correctly) in paragraphs 5 - 9, so I suggest deleting paragraph 15.
(A parallel situation exists for fwscanf.)
----------
Randy Meyers
Comment 1.
Category: Editorial change
Committee Draft subsection: 6.11
Title: Conforming implementations should be allowed to provide additional
floating point types
Detailed description:
One of the contentious issues in C89 and C9x is whether an implementation
or even a new standard may provide additional integer types not explicitly
mentioned in C89. C9x has now definitively stated that such types may be
supplied.
However, C9x fails to make any statements about implementation defined
floating point types or even warn programmers that a future standard may
include such types. The Future Language Directions Subclause (6.11)
should contain a new Subclause stating:
Future standardization may include additional floating-point
types, including those with greater range, precision, or
both, than long double.
In addition, the committee should grant to conforming C9x implementations
a license to add such types. This might be done by adding a statement to
6.2.5 that "While long double has at least the range and precision of
double and float, other floating point types may provide greater range or
greater precision than long double." (This wording needs work.)
------------------------------------------------------------------
Comment 2.
Category: Editorial change
Committee Draft subsection: 7.20.3.4
Title: Rewrite the description of realloc to state it returns a new
object with the same value.
Detailed description:
Objects in C lack expected properties because realloc goes out of its
way to state that it returns a pointer to the same object, even if the
object has been moved to a new address. There are places in the FCD
that discuss objects that incorrectly imply that an object has a fixed
address.
The easiest way to correct this problem is for the description of realloc
in 7.20.3.4 to state it returns a pointer to a new object. Here is the
suggested new wording (note the wording does not change the behavior of
realloc):
The *realloc* function deallocates the old object pointed to
by *ptr* and returns a pointer to a new object that has the
size specified by *size*. The contents of the new object
shall be the same as the old object before deallocation up
to the lesser of the size of the old object and *size*. Any
bytes in the new object beyond the size of the old object
have indeterminate values.
If *ptr* is a null pointer, the *realloc* function behaves
like the *malloc* function for the specified size.
Otherwise, if *ptr* does not match a pointer earlier
returned by the *calloc*, *malloc*, or *realloc* function,
or if the space has been deallocated by a call to the *free*
or *realloc* function, the behavior is undefined. If memory
for the new object cannot be allocated, the old object is
not deallocated and its value is unchanged.
Returns
The *realloc* function returns a pointer to the new object,
which may have the same value as *ptr*, or a null pointer if
the new object could not be allocated.
------------------------------------------------------------------
Antoine Leca
Comment 1.
Category: Feature that should be included
Committee Draft subsection: 7.23, 7.25, 7.26
Title: Useless library functions made deprecated
Detailed description:
mktime (7.23.2.3) is entirely subsumed by mkxtime (7.23.2.4).
Similar cases occur with gmtime/localtime (7.23.3.3/7.23.3.4)
vs zonetime (7.23.3.7, strftime (7.23.3.5) vs strfxtime
(7.23.3.6), and wcsftime (7.24.5.1) vs wcsfxtime (7.24.5.2).
The former functions do not add significant value over the latter
ones (in particular, execution time are similar). So if the
latter are to be kept (that is, if the below comment is dropped),
the former should be put in the deprecated state, to avoid
spurious specifications to be kept over years.
Comment 2.
Category: Feature that should be removed
Committee Draft subsection: 7.23, 7.24.5.2
Title: Removal of struct tmx material in the <time.h> library subclause
Detailed description:
a) The mechanism of tm_ext and tm_extlen is entirely new to the
C Standard, so attention should be paid to the use that can be
done of it. Unfortunately, the current text is very elliptic about
this use, particularly about the storage of the further members
referred by 7.23.1p5.
In particular, it is impossible from the current wording to know how
to correctly free a struct tmx object whose tm_ext member is
not NULL, as in the following snippet:
// This first function is OK (providing correct understanding of my
behalf).
struct tmx *tmx_alloc(void) // alloc a new struct tmx object
{
struct tmx *p = malloc(sizeof(struct tmx));
if( p == NULL ) handle_failed_malloc("tmx_alloc");
memchr(p, 0, sizeof(struct tmx)); // initialize to 0 all members
p->tm_isdst = -1;
p->tm_version = 1;
p->tm_ext = NULL;
return p;
}
// This second function have a big drawback
void tmx_free(struct tmx *p) // free a previously allocated object
{
if( p == NULL ) return; // nothing to do
if( p->tm_ext ) {
// some additional members have been added by the
implementation
// or by users' programs using a future version of the
Standard
// since we do not know what to do, do nothing.
;
// If the members were allocated, they are now impossible to
// access, so might clobber the memory pool...
}
free(p);
return;
}
Various fixes might be thought of. Among these, I see:
- always require that allocation of the additional members be in
control of the implementation; this way, programs should never
"free() tm_ext"; effectively, this makes these additional members
of the same status as are currently the additional members that
may be (and are) part of struct tm or struct tmx
- always require these additional objects to be separately dynamically
allocated. This requires that copies between two struct tmx objects
should dynamically allocate some memory to keep these objects.
In effect, this will require additional example highlight this
(perhaps showing what a tmxcopy(struct tmx*, const struct tmx*)
function might be).
Both solutions have pros and cons. But it is clear that the current
state, that encompasses both, is not clear enough.
Other examples of potential pitfalls are highlighted below.
b) This extension mechanism might be difficult to use with
implementations that currently have additional members to struct tm
(_tm_zone, containing a pointer a string meaning the name of the time
zone, and _tm_gmtoff, whose meaning is almost the same as tm_zone,
except that it is 60 times bigger). This latter is particularly
interesting, since it might need tricky kludges to assure the internal
consistency of the struct tmx object (any change to either member
should ideally be applied to the other, yielding potential problems
of rounding). Having additional members, accessed though tm_ext,
for example one whose effect is to duplicate _tm_zone behaviour,
probably is awful while seen this way.
c) 7.23.1p5 states that positive value for tm_zone means that the
represented brocken-down time is ahead of UTC. In the case when
the relationship between the brocken-down time and UTC is not known
(thus tm_zone should be equal to _LOCALTIME), it is therefore
forbidden to be positive. This might deserve a more explicit
requirement in 7.23.1p2.
d) POSIX compatibility, as well as proper support of historical
time zones, will require tm_zone to be a count of seconds instead
of a count of minutes; this will in turn require tm_zone to be
enlarged to long (or to int_least32_t), to handle properly
the minimum requirements.
e) POSIX compatibility might be defeated with the restriction
set upon Daylight Saving Times algorithms to actually *advance*
the clocks. This is a minor point, since there is no historical
need, nor any perceived real need, for such a "feature".
f) On implementations that support leap seconds, 7.23.2.2
(difftime) do not specify whether the result should include
(thus considering calendar time to be effectively UTC) or
disregard (thus considering calendar time to be effectively
TAI) leap seconds. This is unfortunate.
g) The requirements set up by 7.23.2.3p4 (a second call to
mktime should yield the same value and should not modify the
brocken-down time) is too restrictive for mktime, because
mktime does not allow complete determination of the calendar
time associated with a given brocken-down time. Examples
include the so-called "double daylight saving time" that
were in force in the past, or when the time zone associated
with the time changes relative to UTC.
For example, in Sri Lanka, the clocks move back from 0:30
to 0:00 on 1996-10-26, permanently. So the timestamp
1996-10-26T00:15:00, tm_isdst=0 is ambiguous when given to
mktime(); and widely deployed implementations exist that use
caches, thus might deliver either the former of the later
result on a random basis; this specification will effectively
disallow caching inside mktime, with a big performance hit
for users.
This requirement (the entire paragraph) should be withdrawn.
Anyway, mktime is intended to be superseded by mkxtime, so
there is not much gain trying to improve a function that is
to be declared deprecated.
h) The case where mktime or mkxtime is called with tm_zone set
to _LOCALTIME and tm_isdst being negative (unknown), and when
the input moment of time is inside the "Fall back", that is
between 1:00 am and 2:00 am on the last Sunday in October
(in the United States), leads to a well known ambiguity.
Contrary to what might have been waited for, this ambiguity
is not solved by the additions of the revision of the Standard
(either results might be returned): all boiled down to the
sentence in 7.23.2.6, in the algorithm, saying
// X2 is the appropriate offset from local time to UTC,
// determined by the implementation, or [...]
Since there are two possible offsets in this case...
i) Assuming the implementation handle leap seconds, if brocken-
down times standing in the future are passed (where leap seconds
cannot de facto be determined), 7.23.2.4p4 (effect of _NO_LEAP_
SECONDS on mkxtime), and in particular the sentence between
parenthesis, seems to require that the count of leap seconds
should be assumed to be 0. This would be ill advised; I would
prefer it to be implementation-defined, with the recommended
practice (or requirement) of being 0 for implementations that
do not handle leap seconds.
j) Assuming the implementation handle leap seconds, the effect
of 7.23.2.4p4 is that the "default" behaviour on successive
calls to mkxtime yields a new, strange scale of time that is
neither UTC nor TAI. For example (remember that there will
be introduced a positive leap second at 1998-12-31T23:59:60Z
in renewed ISO 8601 notation):
struct tmx tmx = {
.tm_year=99, .tm_mon=0, .tm_mday=1, .tm_hour=0, .tm_min=0,
.tm_sec=0,
.tm_version=1, .tm_zone=_LOCALTIME, .tm_ext=NULL,
.tm_leapsecs=_NO_LEAP_SECONDS }, tmx0;
time_t t1, t0;
double delta, days, secs;
char s[SIZEOF_BUFFER];
t1 = mkxtime(&tmx);
puts(ctime(&t1));
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
tmx0 = tmx; // !!! may share the object pointed to by
tmx.tm_ext...
++tmx.tm_year;
t2 = mkxtime(&tmx);
puts(ctime(&t2));
delta = difftime(t2, t1);
days = modf(delta, &secs);
printf("With ++tm_year: %.7e s == %f days and %f s\n", delta, days,
secs);
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
tmx = tmx0; // !!! may yield problems if the content pointed
to by
// tm_ext have been modified by the previous
call...
tmx.tm_hour += 24*365;
tmx.tm_leapsecs = _NO_LEAP_SECONDS;
t2 = mkxtime(&tmx);
puts(ctime(&t2));
delta = difftime(t2, t1);
days = modf(delta, &secs);
printf("With tm_hour+=24*365: %.7e s == %f days and %f s\n", delta,
days, secs);
if( tmx.tm_leapsecs == _NO_LEAP_SECONDS )
printf("Unable to determine number of leap seconds applied.\n");
else
printf("tmx.tm_leapsecs = %d\n", tmx.tm_leapsecs);
Without leap seconds support, results should be consistent and straight-
forward; like (for me in Metropolitan France):
Thu Jan 1 01:00:00 1998
Unable to determine number of leap seconds applied.
Fri Jan 1 01:00:00 1999
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
Unable to determine number of leap seconds applied.
Fri Jan 1 01:00:00 1999
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
Unable to determine number of leap seconds applied.
Things may change with leap seconds support; assuming we are in a
time zone behind UTC (e.g. in the United States), the results might be:
Wed Dec 31 21:00:00 1997
tmx.tm_leapsecs = 31
Thu Dec 31 21:00:00 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Thu Dec 31 21:00:00 1998
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
But with a time zone ahead of UTC, results might be
Thu Jan 1 01:00:00 1998
tmx.tm_leapsecs = 31
Thu Dec 31 00:59:59 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Fri Jan 1 01:00:00 1999
With tm_hour+=24*365: 3.1536001e+07 s == 365 days and 1 s
tmx.tm_leapsecs = 32
And if the time zone is set to UTC, results might be
Thu Jan 1 00:00:00 1998
tmx.tm_leapsecs = 31
Thu Dec 31 23:59:60 1998
With ++tm_year: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
Fri Jan 1 00:00:00 1999
With tm_hour+=24*365: 3.1536001e+07 s == 365 days and 1 s
tmx.tm_leapsecs = 32
or, for the three later lines
Thu Dec 31 23:59:60 1998
With tm_hour+=24*365: 3.1536000e+07 s == 365 days and 0 s
tmx.tm_leapsecs = 31
The last result is questionable, since both choices are allowed by
the current text (the result is right into the leap second itself).
Moreover, implementations with caches might return either on a
random basis...
Bottom line: the behaviour is surprising at least.
k) 7.23.2.6p2 (maximum ranges on input to mkxtime) use LONG_MAX
sub multiples to constraint the members' values. Outside the fact
that the limitations given may easily be made greater in general
cases, it have some defects:
- the constraint disallow the common use on POSIX box of tm_sec
as an unique input member set to a POSIX style time_t value;
- the constraints are very awkward to implementations where
long ints are bigger than "normal" ints: on such platforms, all
members should first be converted to long before any operation
take place;
- since there are eight main input fields, plus a ninth (tm_zone)
which is further constrained to be between -1439 and +1439, the
result might nevertheless overflows, so special provision to
handle should take place in any event.
l) There is an obvious (and already known) typo in the description
of D, regarding the handling of year multiple of 100. Also
this definition should use QUOT and REM instead of / and %.
m) Footnote 252 introduces the use of these library functions with
the so-called proleptic Gregorian calendar, that is the rules for
the Gregorian calendar applied to any year, even before Gregory's
reform. This seems to contradict 7.23.1p1, which says that calendar
time's dates are relative to the Gregorian calendar, thus tm_year
should be in any case greater to -318. If it is the intent, another\
footnote in 7.23.1p1 might be worthwhile. Another way is to rewrite
7.23.1p1 something like "(according to the rules of the Gregorian
calendar)". See also point l) below.
n) The static status of the returned result of localtime and gmtime
(which is annoying, but that is another story) is clearly set up
by 7.23.3p1. However, this does not scale well to zonetime, given
that this function might in fact return two objects: a struct tmx,
and an optional object containing additional members, pointed to
by tmx->tm_ext.
If the later is to be static, this might yield problems with mkxtime
as well, since 7.23.2.4p5 states that the brocken-down time "produced"
by mkxtime is required to be identical to the result of zonetime
(This will effectively require that tm_ext member should always
point to static object held by the implementation; if it is the
original intent, please state it clearly).
o) There is a direct contradiction between the algorithm given for
asctime in 7.23.3.1, which will overflow if tm_year is not in the
range -11899 to +8099, and the statements in 7.23.2.6 that intent
to encompass a broader range.
All of these points militate to a bigger lifting of this part
of the library. Such a job have been initiated recently, as
Technical Committee J11 is aware of. In the meantime, I suggest
dropping all these new features from the current revision of
the Standard.
It means in effect:
i) removing subclauses 7.23.2.4 (mkxtime), 7.23.2.6 (normalization),
7.23.3.6 (strfxtime), 7.23.3.7 (zonetime), 7.24.5.2 (wcsfxtime).
ii) removing paragraphs 7.23.2.3p3 and 7.23.2.3p4 (references to
7.23.2.6),
iii) the macros _NO_LEAP_SECONDS and _LOCALTIME in 7.23.1p2 should be
removed, as they are becoming useless. Same holds for struct tmx
in 7.23.1p3.
iv) 7.23.1p5 (definition of struct tmx) should also be removed, as it
is becoming useless too.
----------
Douglas A. Gwyn (IST)
Comment 1.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 4
Title: Clarify ``shall accept any strictly conforming program''
Detailed description:
There seems to be widespread confusion over the meaning of the phrase
``shall accept any strictly conforming program''.
I suggest the following changes:
1) In paragraph 6, change the two occurrences of:
accept any strictly conforming program
to:
successfully translate any strictly conforming program that does
not exceed its translation limits
(Editorially, two commas should also be added in the list of
qualifications after the second instance of this replacement.)
Note that this does not imply that the implementation's translation
limits are exactly the set listed in subsection 5.2.4.1.
2) Add a forward reference:
translation limits (5.2.4.1).
------------------------------------------------------------------------
Comment 2.
Category: Feature that should be removed
Committee Draft subsection: 4
Title: Conforming program is useless concept
Detailed description:
A ``conforming program'' is one that just happens to work under some
implementation. This concept is of no value in a standards context.
I suggest the following changes:
1) Delete paragraph 7:
A *conforming program* is one that is acceptable to a conforming
implementation.
2) Delete the second sentence of footnote 4:
Conforming programs may depend upon nonportable features of a
conforming implementation.
3) Move the remaining sentence of footnote 4 into footnote 2, before
the existing text in footnote 2:
2) Strictly conforming programs are intended to be maximally
portable among conforming implementations. A strictly
conforming program can use conditional features ...
------------------------------------------------------------------------
Comment 3.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 5.1.2.2.1, 5.1.2.2.3
Title: Clean up interface to main()
Detailed description:
There is no need for the confusion caused by there being two distinct
interfaces for the function ``main''. The ``one true interface'' should
be the one that is specified. Implementations can always support
alternate interfaces in addition to whatever is specified.
Also, it would help in answering questions about the execution context
for the function ``main'', for example in connection with atexit-
registered functions, if main were specified as a normal C function
invoked in a well-defined manner.
I suggest the following changes:
1) In 5.1.2.2.1 paragraph 1, change the final sentence to:
It shall be defined with a return type of int and two parameters
(referred to here as argc and argv, though any names may be
used):
int main(int argc, char *argv[]) { /* ... */ }
or equivalent.(8)
2) In 5.1.2.2.1 paragraph 2, delete:
If they are declared,
(Editorially, the new first word of that sentence should be
capitalized: ``The parameters ...'')
3) Replace 5.1.2.2.3 paragraph 1 with:
After its arguments are set up at program startup, the main
function is called by the execution environment as follows:
exit(main(argc, argv));
Thus the value returned by the main function is used as the
termination status argument to the exit function.
Note that main is now a normal function, so a missing return value is
not automatically supplied as 0. This is good, as it discourages
sloppiness (and reflects how many implementations have historically
behaved in such situations).
------------------------------------------------------------------------
Comment 4.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.3.2.3
Title: Support writing explicit null pointer values
Detailed description:
A construct such as (int *)0 should represent a null pointer value of
the obvious type.
I suggest the following change:
1) In paragraph 3, change:
assigned to or compared for equality to
to:
assigned to, converted to, or compared for equality to
------------------------------------------------------------------------
Comment 5.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.4.2.2
Title: Define __func__ outside a function
Detailed description:
There is no specification for __func__ when not lexically enclosed by
any function.
I suggest the following change:
1) Append to paragraph 1:
If there is no lexically-enclosing function, an empty
*function-name* is used.
------------------------------------------------------------------------
Comment 6.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.8.5.3
Title: Fix ``for'' statement specification
Detailed description:
There are several problems with the current specification of the ``for''
statement, as discussed in the recent Santa Cruz meeting. For example,
the ``equivalent'' sequence of statements cannot be considered a simple
textual substitution, and the introduction of additional levels of
braces (compound statements) raises issues in connection with the
translation limit minimum requirements in subsection 5.2.4.1.
I believe other committee members have devised a suitable correction;
this comment is merely to ensure that the issue is not overlooked.
------------------------------------------------------------------------
Comment 7.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 6.10.3.3, 6.10.3.5
Title: Placemarker preprocessing tokens are temporary
Detailed description:
The reason that placemarker preprocessing tokens do not appear in the
formal grammar is that they are temporary markers created by the
implementation. This could be made clearer.
Also, the introductory sentence to EXAMPLE 5 is misleading.
I suggest the following changes:
1) At the end of 6.10.3.3 paragraph 2, attach a footnote:
(134.5) Placemarker preprocessing tokens do not appear in the
formal grammar, because they are temporary entities created by
the implementation during translation phase 4 which vanish
before translation phase 5.
2) In EXAMPLE 5 after subsection 6.10.3.5, replace:
To illustrate the rules for
placemarker ## placemarker
the sequence
with:
to illustrate the rules for placemarkers, the sequence
------------------------------------------------------------------------
Comment 8.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.2.1.1
Title: Argument to assert is not necessarily _Bool
Detailed description:
There is a general problem with specifying library macros via Synopses
that pretend they are functions within the type system, but it is
worst for the assert macro, so that's what I most want to see fixed.
I suggest the following changes:
1) In paragraph 1 (Synopsis), remove:
__Bool
2) In the second sentence of paragraph 2, change:
if expression is false (that is, compares equal to 0)
to:
if expression compares equal to 0
------------------------------------------------------------------------
Comment 9.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.18.1.1
Title: Specify size of exact-width integer types
Detailed description:
The utility of the exact-width integer types in strictly conforming
programs would be significantly enhanced, especially for the signed
variety, if certain additional properties were specified. The following
suggestion seems to match the actual uses to which these types are
typically put.
I suggest the following change:
1) Append to the first sentence of paragraph 1:
and contains no padding bits
------------------------------------------------------------------------
Comment 10.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.18.1.1
Title: Specify twos-complement for exact-width integer types
Detailed description:
The utility of the exact-width integer types in strictly conforming
programs would be significantly enhanced, especially for the signed
variety, if certain additional properties were specified. The following
suggestion seems to match the actual uses to which these types are
typically put.
I suggest the following change:
1) Insert the following after the first sentence of paragraph 1:
Further, the sign bit of an exact-width signed integer
type shall represent the value -2^n.
NOTE: This says it must have a twos-complement representation, which is
most convenient for implementing multiple-precision arithmetic and for
matching most externally-imposed formats.
------------------------------------------------------------------------
Comment 11.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.18.1.1
Title: Require universal support for exact-width integer types
Detailed description:
The utility of the exact-width integer types in strictly conforming
programs would be significantly enhanced if they were always available.
I suggest the following change:
1) In paragraph 2, change:
(These types need not exist in an implementation.)
to:
These types shall exist in every conforming implementation.
NOTE: This will impose a burden on some implementations, for example
one where all integer types larger than char are 64 bits wide. The
justification for this is that any strictly conforming program that
needs one of these types would otherwise have to allow for its
nonexistence, requiring considerable code to simulate the missing
type. But that's exactly the sort of thing that a compiler ought to take
care of.
------------------------------------------------------------------------
Comment 12.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 7.20.2.2
Title: Delete example implementation of rand function
Detailed description:
The portable implementation of the rand and srand functions has been
criticised as being of inferior quality and as appearing to be a
recommendation for the actual implementation.
I suggest the following change:
1) Remove paragraph 5 (EXAMPLE).
NOTE: The example should be moved into the Rationale document.
------------------------------------------------------------------------
Comment 13.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.20.4.2
Title: Forbid atexit or exit call from registered function
Detailed description:
Some implementations could have trouble supporting registration of
new functions during processing of the registered-function list at
normal program termination.
Also, atexit-registered functions ought not to (recursively) invoke
the exit function.
I suggest the following change:
1) Append to paragraph 2:
If a registered function executes a call to the atexit
or exit functions, the behavior is undefined.
------------------------------------------------------------------------
Comment 14.
Category: Normative change to intent of existing feature
Committee Draft subsection: 7.23.1, 7.23.2.4, 7.23.2.6
Title: Remove struct tmx and recipe for normalizing broken-down times
Detailed description:
There are several problems with the current specification of broken-
down times, as discussed in the recent Santa Cruz meeting. There are
apparently errors in the algorithm given in subsection 7.23.2.6
paragraph 3, and groups working in this general area have complained
that struct tmx is incomplete and not consistent with the approach
they are developing.
I believe the committee has in principle already agreed to revert to
the existing standard in this area; this comment is merely to ensure
that the issue is not overlooked.
----------
John Hauser
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 7.12.8.4 and F.9.5.4
Title: tgamma(0) should equal 1/0
Detailed description:
Problem:
Section F.9.5.4 states:
-- tgamma(x) returns a NaN and raises the invalid exception if x is
a negative integer or zero.
However, when x is zero, tgamma(x) should equal 1/x, which is not
ambiguous under IEEE Standard (IEC 60559) arithmetic.
Fix:
Modify Section F.9.5.4 to specify:
-- tgamma(+/-0) returns +/-infinity.
-- tgamma(x) returns a NaN and raises the invalid exception if x is
a negative integer.
Correspondingly, adjust the description of tgamma in Section 7.12.8.4
to say:
A domain error occurs if x is a negative integer or if the result
cannot be represented when x is zero.
------------------------------------------------------------------
Comment 2.
Category: Editorial change/non-normative contribution ?
Committee Draft subsection: 7.3.4
Title: ``usual mathematical formulas'' vague
Detailed description:
Problem:
In Section 7.3.4 it is explained that when the state of the
CX_LIMITED_RANGE pragma is on, an implementation may use the ``usual
mathematical formulas'' to implement complex multiply, divide,
and absolute value. Unfortunately, the term ``usual mathematical
formulas'' is unnecessarily vague. A footnote specifies the intended
formulas, but footnotes are not normative.
Fix:
Fold Footnote 151 into the text by stating explicitly that when the
state of the CX_LIMITED_RANGE macro is on, complex multiply, divide,
and absolute value may be implemented by the specified formulas.
------------------------------------------------------------------
Comment 3.
Category: Editorial change/non-normative contribution
Committee Draft subsection: 7.3.5.4
Title: ``function'' should be ``functions''
Detailed description:
In Section 7.3.5.4 (The ccos functions), change ``function computes''
to ``functions compute''.
------------------------------------------------------------------
Comment 4.
Category: Editorial change/non-normative contribution
Committee Draft subsection: G.5.2.2
Title: Missing plus sign
Detailed description:
In Section G.5.2.2 (The casinh functions), change
``casinh(infinity+i*infinity)'' to ``casinh(+infinity+i*infinity)''.
------------------------------------------------------------------
Comment 5.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: G.5.2.3
Title: catanh(1+i0) not specified
Detailed description:
Problem:
Section G.5.2.3 neglects to specify the result for catanh(1+i0).
Possible fix:
Add to Section G.5.2.3 the rule:
-- catanh(1+i0) returns +infinity+i*NaN and raises the invalid
exception.
------------------------------------------------------------------
Comment 6.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: G.5.2.4
Title: ccosh(+inf+i*inf) inconsistent with other functions
Detailed description:
Problem:
Section G.5.2.4 specifies that
-- ccosh(+infinity+i*infinity) returns +infinity+i*NaN and raises
the invalid exception.
But ccosh(+infinity+i*infinity) is not necessarily in the positive real
half-plane; any complex infinity is an equally plausible result. For a
nearly identical case, the Section G.5.2.5 requires:
-- csinh(+infinity+i*infinity) returns +/-infinity+i*NaN (where the
sign of the real part of the result is unspecified) and raises
the invalid exception.
Other such cases are similarly stipulated throughout Section G.5.
Fix:
In Section G.5.2.4, specify the ccosh case to be like csinh (and
others):
-- ccosh(+infinity+i*infinity) returns +/-infinity+i*NaN (where the
sign of the real part of the result is unspecified) and raises
the invalid exception.
------------------------------------------------------------------
Comment 7.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: G.5.2.6
Title: Special cases of ctanh incorrect
Detailed description:
Problem:
Section G.5.2.6 specifies:
-- ctanh(+infinity+i*y) returns 1+i0, for all positive-signed
numbers y.
-- ctanh(x+i*infinity) returns NaN+i*NaN and raises the invalid
exception, for finite x.
However, working from the formula
sinh(2x) sin(2y)
tanh(x+iy) = ---------------- + i ----------------
cosh(2x)+cos(2y) cosh(2x)+cos(2y)
it appears that the above cases have not been defined with the same
care applied to other functions in Section G.5. In particular,
ctanh(+infinity+i*y) should be 1+i0*sin(2y), and ctanh(+0+i*infinity)
can be more accurately defined as +0+i*NaN.
Fix:
Modify Section G.5.2.6 to say:
-- ctanh(+infinity+i*y) returns 1+i0*sin(2y), for positive-signed
finite y.
-- ctanh(+infinity+i*infinity) returns 1+/-i0 (where the sign of
the imaginary part of the result is unspecified).
-- ctanh(+0+i*infinity) returns +0+i*NaN and raises the invalid
exception.
-- ctanh(x+i*infinity) returns NaN+i*NaN and raises the invalid
exception, for finite nonzero x.
-- ctanh(+0+i*NaN) returns +0+i*NaN.
-- ctanh(x+i*NaN) returns NaN+i*NaN and optionally raises the
invalid exception, for finite nonzero x.
------------------------------------------------------------------
Comment 8.
Category: Editorial change/non-normative contribution
Committee Draft subsection: G.5.3.2
Title: Missing minus sign
Detailed description:
In Section G.5.3.2 (The clog functions), change ``clog(0+i0)'' to
``clog(-0+i0)''.
------------------------------------------------------------------
Comment 9.
Category: Normative change to intent of existing feature
Committee Draft subsection: G.5
Title: Directions of complex zeros and infinities
Detailed description:
Problem:
Throughout Section G.5 it is assumed that the following complex zero
and infinity values have these directions in the complex plane:
-0-i0 -pi
-0+i0 +pi
+0-i0 -0
+0+i0 +0
-infinity-i*infinity -3pi/4
-infinity+i*infinity +3pi/4
+infinity-i*infinity -pi/4
+infinity+i*infinity +pi/4
Unfortunately, none of these directions is mathematically determinate,
and the consequence is that calculations are made a little more
susceptible to returning incorrect results without warning. The effect
of this set of assumptions on the properties of the complex functions
is similar to what would result from assuming, for example, that
+infinity/+infinity = 1. (In fact, the assumption that the direction
of +infinity+i*infinity is +pi/4 is virtually equivalent to the
assumption that +infinity/+infinity = 1.)
It is often claimed that the explanation for these choices can be found
in W. Kahan's paper, ``Branch Cuts for Complex Elementary Functions'',
published in the Proceedings of the Joint IMA/SIAM Conference on the
State of the Art in Numerical Analysis, 1987. But after correctly
observing that the direction of +infinity+i*infinity is not determinate
(it's ``some fixed but unknown theta strictly between 0 and pi/2''),
Kahan turns around and merely _asserts_ the directions listed above,
without justification. The paper on which the choices above are
probably based supplies _no_argument_ for them.
The risks caused by these choices might be tolerable if they allowed
information about angle to be usefully preserved through at least some
calculations. Unfortunately, a somewhat tedious examination of the
basic operations and functions (addition, multiplication, clog, etc.)
shows that there is almost no benefit. The ``convenient values''
either quickly disappear into NaNs (harmless) or they soon become
arbitrary (not so harmless). There is rarely a case where they
preserve any useful information. Since the benefits do not appear
to be worth the risks, this approach ought not to be adopted by the
C Standard.
Preferred fix:
As has been independently proposed in other public comments, first
fix the atan2 function so that a domain error occurs for atan2(x,y)
when x and y are both zero or both an infinity.
In Section G.5.1.1, substitute the rule:
-- cacos(+/-infinity+i*infinity) returns NaN-i*infinity and raises
the invalid exception.
In Section G.5.2.1, substitute:
-- cacosh(+/-infinity+i*infinity) returns +infinity+i*NaN and
raises the invalid exception.
In Section G.5.2.2, substitute:
-- casinh(+infinity+i*infinity) returns +infinity+i*NaN and raises
the invalid exception.
And in Section G.5.3.2, substitute the rules:
-- clog(+/-0+i0) returns -infinity+i*NaN and raises the invalid
exception.
-- clog(+/-infinity+i*infinity) returns +infinity+i*NaN and raises
the invalid exception.
Alternative fix:
[This proposal may look complicated, but it actually involves fairly
benign substitutions in the Standard.]
If the preferred fix cannot be adopted, the safer form should at least
be permitted. In Annex G, make it implementation-defined whether the
carg function returns NaN or a finite value for zeros and infinities.
If it returns a finite value then
carg(-0-i0) -> -pi
carg(-0+i0) -> +pi
carg(+0-i0) -> -0
carg(+0+i0) -> +0
carg(-infinity-i*infinity) -> -3pi/4
carg(-infinity+i*infinity) -> +3pi/4
carg(+infinity-i*infinity) -> -pi/4
carg(+infinity+i*infinity) -> +pi/4
as currently required. Otherwise, carg returns NaN and raises the
invalid exception for these cases.
In Section G.5.1.1, replace the four rules:
-- cacos(-infinity+i*infinity) returns 3pi/4-i*infinity.
-- cacos(+infinity+i*infinity) returns pi/4-i*infinity.
-- cacos(x+i*infinity) returns pi/2-i*infinity, for finite x.
-- cacos(NaN+i*infinity) returns NaN-i*infinity.
with the single rule:
-- cacos(x+i*infinity) evaluates as carg(x+i*infinity)-i*infinity,
for all x (including NaN).
In Section G.5.2.1, replace:
-- cacosh(-infinity+i*infinity) returns +infinity+i*3pi/4.
-- cacosh(+infinity+i*infinity) returns +infinity+i*pi/4.
-- cacosh(x+i*infinity) returns +infinity+i*pi/2, for finite x.
-- cacosh(NaN+i*infinity) returns +infinity+i*NaN.
with:
-- cacosh(x+i*infinity) evaluates as +infinity+i*carg(x+i*infinity)
for all x (including NaN).
In Section G.5.2.2, replace:
-- casinh(+infinity+i*infinity) returns +infinity+i*pi/4.
-- casinh(x+i*infinity) returns +infinity+i*pi/2, for
positive-signed finite x.
with:
-- casinh(x+i*infinity) evaluates as +infinity+i*carg(x+i*infinity)
for all positive-signed numbers x.
And in Section G.5.3.2, replace:
-- clog(-0+i0) returns -infinity+i*pi and raises the divide-by-zero
exception.
-- clog(+0+i0) returns -infinity+i0 and raises the divide-by-zero
exception.
-- clog(-infinity+i*infinity) returns +infinity+i*3pi/4.
-- clog(+infinity+i*infinity) returns +infinity+i*pi/4.
-- clog(x+i*infinity) returns +infinity+i*pi/2, for finite x.
with:
-- clog(-0+i0) evaluates as -infinity+i*carg(-0+i0) and, if the
imaginary part of the result is not NaN, also raises the
divide-by-zero exception.
-- clog(+0+i0) evaluates as -infinity+i*carg(+0+i0) and, if the
imaginary part of the result is not NaN, also raises the
divide-by-zero exception.
-- clog(x+i*infinity) evaluates as +infinity+i*carg(x+i*infinity)
for all x (including NaN).
----------
Paul Eggert
Comment 1.
Category: Normative change to existing feature retaining the original
intent
Committee Draft subsection: 6.10.3.2
Title: Stringizing a string containing a UCN should have specified
behavior
Detailed description:
The semantics of UCNs have improved greatly in the latest draft, but I
noticed one glitch. Section 6.10.3.2 says that ``it is unspecified
whether a \ character is inserted before the \ character beginning a
universal character name''. This lack of specification means that one
cannot reliably convert from a program using an extended source
character set to one using only UCNs (or vice versa), as this might
change the program's behavior.
For example, suppose @ represents MICRO SIGN (code 00B5). Then
`assert (strcmp (unit, "@") != 0)' might change its behavior if the
program is converted to UCN form `assert (strcmp (unit, "\u00B5") !=
0)', since (if the assertion fails) the former's output will contain
"@", but for the latter it is unspecified whether the output will
contain "@" or the six characters "\u00B5".
Problems like these might be manageable if the programmer has control
over (or at least can inspect) all macro definitions. But they become
unmanageable if code is intended to be portable to all implementations
and/or libraries that supplies macros that might stringify their
arguments, as there's no convenient way for the programmer to
determine whether the macros might change their behavior if the
program is converted to or from UCN form.
For example, the Solaris 2.6 header files for tracing threads have
macros that stringify their arguments, but most users of these macros
don't know this fact. With 6.10.3.2's current wording, such users are
at risk when they convert their program to or from UCN form.
A consequence of the current wording is that careful programmers will
have to avoid passing strings containing UCNs or extended source chars
to any macro (or possible macro) not under the programmer's control.
This is error-prone and restrictive.
I realize that the current lack of specification is to allow
implementations that keep UCNs (or convert all extended chars to UCNs)
during processing, but for such implementations it's a very small
overhead to avoid prepending \ to UCNs when stringizing strings. It's
well worth doing this to standardize behavior and make it easier to
port programs.
Suggestion:
In 6.10.3.2 paragraph 2, change ``it is unspecified whether a \
character is inserted before the \ character beginning a universal
character name'' to ``a \ character is not inserted before the \
character beginning a universal character name''.
------------------------------------------------------------------
Comment 2.
Category: Feature that should be removed
Committee Draft subsection: 7.23
Title: Remove struct tmx and associated functions
Detailed description:
Comment 14 in US0011 (1998-03-04) discusses several problems in the
struct-tmx-related changes by Committee Draft 1 (CD 1) to <time.h>.
Unfortunately, many of these problems remain in the final committee
draft (FCD), and I've since learned of other problems. I summarize
these remaining problems in Appendix 1 below.
Also, Clive Feather (who I understand is responsible for most of the
<time.h> changes in CD 1 and FCD) has proposed that a new <time.h>
section be written to address these problems. I welcome this
proposal, and would like to contribute. However, I believe that it's
too late in the standardization process to introduce major
improvements to <time.h>, as there will be insufficient time to gain
implementation experience with these changes, experience that is
needed for proper review.
Instead, I propose that <time.h>'s problems be fixed by removing the
struct-tmx-related changes to <time.h>, reverting to the the current
ISO C standard (C89); we can then come up with a better <time.h> for
the next standard (C0x). In other words, I propose the following:
* Change <time.h> to define only the types and functions that
were defined in C89's <time.h>, and to remove a new requirement
on mktime. Appendix 2 gives the details.
* Work with Clive Feather and other interested parties to
write and test a revised <time.h> suitable for inclusion in C0x.
Please let me know of any way that I can further help implement this
proposal.
------------------------------------------------------------
Appendix 1. Problems in the struct-tmx-related part of <time.h>
Here is a summary of technical problems in the struct-tmx-related part
of FCD (1998-08-03), section 7.23. The problems fall into two basic
areas:
* struct tmx is not headed in the right direction.
The struct-tmx-related changes do not address several well-known
problems with C89 <time.h>, and do not form a good basis for
addressing these problems. These problems include the following.
- Lack of precision. The standard does not require precise
timekeeping; typically, time_t has only 1-second precision.
- Inability to determine properties of time_t. There's no
portable way to determine the precision or range of time_t.
- Poor arithmetic support for the time_t type. difftime is not
enough for many practical applications.
- The new interface is not reentrant. A common extension to C89
is the support of reentrant versions of functions like
localtime. This extension is part of POSIX.1. There's no good
reason (other than historical practice) for time-related
functions to rely on global state; any new extensions should be
reentrant.
- No control over time zones. There's no portable way for an
application to inquire about the time in New York, for example,
even if the implementation supports this.
- Missing conversions. There's no way to convert between UTC and TAI,
or between times in different time zones, or to determine which time
zone is in use.
- No reliable interval time scale. If the clock is adjusted to keep
in sync with UTC, there's no reliable way for a program to ignore
this change.
- One cannot apply strftime to the output of gmtime,
as the %Z and %z formats may be misinterpreted.
(Credit: I've borrowed many of the above points from discussions by
Clive Feather and Markus Kuhn.)
* struct tmx has several technical problems of its own.
Even on its own terms, struct tmx has several technical problems
that would need to be fixed before being made part of a standard.
These problems include the following.
- In 7.23.1 paragraph 5, struct tmx's tm_zone member counts
minutes. This disagrees with common practice, which is to
extend struct tm by adding a new member tm_gmtoff that is UTC
offset in seconds. The extra precision is needed to support
historical time stamps -- UTC offsets that were not a multiple of
one minute used to be quite common, and in at least one locale
this practice did not die out until 1972.
- The tm_leapsecs member defined by 7.23.1 paragraph 5 is an integer,
but it is supposed to represent TAI - UTC, and this value is not
normally an integer for time stamps before 1972. Also, it's not
clear what this value should be for historical time stamps
before the introduction of TAI in the 1950s.
- The tm_ext and tm_extlen members defined by 7.23.1 paragraph 5
use a new method to allow for future extensions. This method
has never before been tried in the C Standard, and is likely to
lead to problems in practice.
For example, the draft makes no requirement on the storage
lifetime of storage addressed by tm_ext. This means that an
application cannot reliably dereference the pointer returned by
zonetime, because it has no way of knowing when the tm_ext
member points to freed storage.
- 7.23.2.3 paragraph 4 adds the following requirement for mktime
not present in C89:
If the call is successful, a second call to the mktime
function with the resulting struct tm value shall always leave
it unchanged and return the same value as the first call.
This requirement was inspired by the struct-tmx-related changes
to <time.h>, but it requires changes to existing practice, and
it cannot be implemented without hurting performance or breaking
binary compatibility.
For example, suppose I am in Sri Lanka, and invoke mktime on the
equivalent of 1996-10-26 00:15:00 with tm_isdst==0. There are
two distinct valid time_t values for this input, since Sri Lanka
moved the clock back from 00:30 to 00:00 that day, permanently.
There is no way to select the time_t by inspecting tm_isdst,
since both times are standard time.
On examples like these, C89 allows mktime to return different
time_t values for the same input at different times during the
execution of the program. This is common existing practice,
but it is prohibited by this new requirement.
It's possible to satisfy this new requirement by adding a new
struct tm member, which specifies the UTC offset. However, this
would break binary compatibility. It's also possible to satisfy
this new requirement by always returning the earlier time_t
value in ambiguous cases. However, this can greatly hurt
performance, as it's not easy for some implementations to
determine that the input is ambiguous; it would require scouting
around each candidate returned value to see whether the value
might be ambiguous, and this step would be expensive.
- The limits on ranges for struct tmx members in 7.23.2.6
paragraph 2 are unreasonably tight. For example, they disallow
the following program on a POSIX.1 host with a 32-bit `long',
since `time (0)' currently returns values above 900000000 on
POSIX.1 hosts, which is well above the limit LONG_MAX/8 ==
268435455 imposed by 7.23.2.6.
#include <time.h>
struct tmx tm;
int main()
{
char buf[1000];
time_t t = 0;
/* Add current time to POSIX.1 epoch, using mkxtime. */
tm.tm_version = 1;
tm.tm_year = 1970 - 1900;
tm.tm_mday = 1;
tm.tm_sec = time (0);
if (mkxtime (&tm) == (time_t) -1)
return 1;
strfxtime (buf, sizeof buf, "%Y-%m-%d %H:%M:%S", &tm);
puts (buf);
return 0;
}
The limits in 7.23.2.6 are not needed. A mktime implementation
need not check for overflow on every internal arithmetic
operation; instead, it can cheaply check for overflow by doing a
relatively simple test at the end of its calculation.
- 7.23.2.6 paragraph 3 contains several technical problems:
. In some cases, it requires mkxtime to behave as if each day
contains 86400 seconds, even if the implementation supports
leap seconds. For example, if the host supports leap seconds
and uses Japan time, then using mkxtime to add 1 day to
1999-01-01 00:00:00 must yield 1999-01-01 23:59:59, because
there's a leap second at 08:59:60 that day in Japan. This
is not what most programmers will want or expect.
. The explanation starts off with ``Values S and D shall be
determined as follows'', but the code that follows does not
_determine_ S and D; it consults an oracle to find X1 and
X2, which means that the code merely places _constraints_ on
S and D. A non-oracular implementation cannot in general
determine X1 and X2 until it knows S and D, so the code,
if interpreted as a definition, is a circular one.
. The code suffers from arithmetic overflow problems. For
example, suppose tm_hour == INT_MAX && INT_MAX == 32767.
Then tm_hour*3600 overflows, even though tm_hour satisfies
the limits of paragraph 2.
. The code does not declare the types of SS, M, Y, Z, D, or S,
thus leading to confusion. Clearly these values cannot be
of type `int', due to potential overflow problems like the
one discussed above. It's not clear what type would suffice.
. The definition for QUOT yields numerically incorrect results
if either (b)-(a) or (b)-(a)-1 overflows. Similarly, REM
yields incorrect results if (b)*QUOT(a,b) overflows.
. The expression Y*365 + (Z/400)*97 + (Z%400)/4 doesn't match
the Gregorian calendar, which has special rules for years
that are multiples of 100.
. The code is uncommented, so it's hard to understand and evaluate.
For example, the epoch (D=0, S=0) is not described; it
appears to be (-0001)-12-31 Gregorian, but this should be
cleared up.
- 7.23.3.7 says that the number of leap seconds is the ``UTC-UT1
offset''.
It should say ``UTC - TAI''.
------------------------------------------------------------
Appendix 2. Details of proposed change to <time.h>
Here are the details about my proposed change to <time.h>. This
change reverts the <time.h> part of the standard to define only the
types, functions, and macros that were defined in C89's <time.h>.
It also removes the hard-to-implement requirement in 7.23.2.3 paragraph 4.
* 7.23.1 paragraph 2. Remove the macros _NO_LEAP_SECONDS and
_LOCALTIME.
* 7.23.1 paragraph 3. Remove the type `struct tmx'.
* 7.23.1 paragraph 5 (struct tmx). Remove this paragraph.
* 7.23.2.3 paragraph 3 (mktime normalization). Remove this paragraph.
* 7.23.2.3 paragraph 4. Remove the phrase ``and return the same
value''.
It's not feasible to return the same value in some cases;
see the discussion of 7.23.2.3 paragraph 4 above.
* 7.23.2.4 (mkxtime). Remove this section.
* 7.23.2.6 (normalization of broken-down times). Remove this section;
this means footnote 252 will be removed.
* 7.23.3 paragraph 1. Remove the reference to strfxtime.
* 7.23.3.6 (strfxtime). Remove this section.
* 7.23.3.7 (zonetime). Remove this section.
-- end of USA comments
____________________ end of SC22 N2872 ________________________________