Submitter: Fred J. Tydeman
Submission Date: 2016-03-24
Document: WG14 N2032
Summary
"white-space character" is defined differently in two places in the standard.
white-space character is defined in 6.4 as:
space, horizontal tab, new-line, vertical tab, form-feed
standard white-space character is defined in 7.4.1.10 for isspace() as:
space, horizontal tab, new-line, vertical tab, form-feed, carriage return
One place it matters is 7.21.6.2 fscanf().
7.21.6.2 fscanf() in paragraph 5 talks about white-space character(s) in the directive. Since there is no reference to isspace, it must be referring to 6.4 (which I believe is wrong).
Paragraph 8, in the same section, talks about input white-space characters, but refers to isspace.
In the following code, the \r (carriage return) is a directive:
#include <stdio.h>
int main(void){
static int rc, cnt1, cnt2, i;
rc = sscanf( " 123", "\r%n%i%n", &cnt1, &i, &cnt2);
printf("rc=%i, cnt1=%i, i=%i, cnt2=%i\n", rc, cnt1, i, cnt2);
return 0;
}
Is the \r a white-space character or an ordinary multibyte character?
By 5.2.1#3, the \r is part of the basic execution character set, but is not part of the basic source character set (as Doug Gwyn pointed out in message 14152).
By 6.4#3, the \r is not a white-space character.
By 7.21.6.2#3, #5 and #6, since the \r is not a white-space character, it is an ordinary multibyte character. Therefore, since the \r does not match the characters of the stream, cnt1, i, and cnt2 are not altered. However, this not what most implementations do. They output: rc=1, cnt1=1, i=123, cnt2=4
I see a mismatch between what implementations are doing and what the standard requires.
Another issue is "white space" in 7.21.6.2#15. It should be "white-space characters". Section 6.4 defines "white space" as both comments and white-space characters. So, the use of "white space" in 7.21.6.2#15 would cause /* comments */ to be matched. The same issue applies to 7.29.2.2#15, 7.29.4.1.2#4, 7.22.1.4#4.
Another issue is "white-space wide character" is not well defined (in 7.30.2.1.10) and is missing from the index. Does the "C" locale matter?
Suggested Technical Corrigendum
There are several ways this basic issue can be addressed.
In clause 7.1.1, add definitions of 'execution white-space character' and 'execution white-space wide character'. Change 'white-space character' to 'execution white-space character' throughout clause 7. Change 'white-space wide character' to 'execution white-space wide character' throughout clause 7.
Throughout clause 7, remove '(as specified by the isspace function)' and '(as specified by the iswspace function)'.
In this clause, references to "white-space character" refer to execution white-space character as defined by isspace(). References to "white-space wide character" refer to execution white-space wide character as defined by iswspace().
Throughout clause 7, remove '(as specified by the isspace function)' and '(as specified by the iswspace function)'.
Some of the above changes also require corresponding changes to Annexes A and J.
Also do these changes: