Submitter: UK C Panel
Submission Date: 2001-09-07
Source: Clive D.W. Feather <clive@demon.net>
Version: 1.1
Date: 2002-03-06
Subject: mbtowc and partial
characters
If mbtowc() is given a partial character (or an escape sequence that isn't a complete character), it returns -1. However, is it supposed to remember the relevant state information or should it ignore it ?
Consider an implementation where the character '\xE' starts an alternate shift state and '\xF' returns to the initial shift state. The wide character encodings are:
initial shift state: | 'x' maps to ASCII codes | |
alternate shift state: | 'x' maps to ASCII codes + 0x100 |
Starting in the initial shift state,
mbtowc (&wc, "\xEZ", 2);
should return 2 and set wc to 0x15A. However, starting in the initial shift state, consider:
mbtowc (&wc1, "\xE", 1); mbtowc (&wc2, "Z", 1);
I would expect that the first call returns -1, leaving wc1 unaltered, while the second returns 1 and sets wc2 to 0x5A. However, is it permitted for the second to set wc2 to 0x15A ? If so, how is an application meant to use mbtowc ?
[The newer function mbrtowc does not have this problem.]
Suggested Technical Corrigendum
The UK C Panel prefers to add a new return value for this case. To do so, change the main part (see the previous DR) of 7.20.7.2#3 to read:
If s is a null pointer, the mbtowc function returns a nonzero or zero value, if multibyte character encodings, respectively, do or do not have state-dependent encodings. If s is not a null pointer, the mbtowc function returns the first of the following that applies (given the current conversion state):
0 if s points to the null character between 1 and n inclusive if the next n or fewer bytes complete a valid multibyte character (which is the value stored); the value returned is the number of bytes that complete the multibyte character. The value returned will not be greater than that of the MB_CUR_MAX macro. (size_t)(-2) if the next n bytes contribute to an incomplete (but potentially valid) multibyte character, and all n bytes have been processed (no value is stored). (size_t)(-1) if an encoding error occurs, in which case the next n or fewer bytes do not contribute to a complete and valid multibyte character (no value is stored); the value of the macro EILSEQ is stored in errno, and the conversion state is unspecified.
(note that most of this wording comes from mbrtowc) and delete #4.
If this option is unacceptable, then append to 7.20.7.2#2:
If the next multibyte character is incomplete or invalid, the shift state is unaffected and nothing is stored.
Committee Response
The Committee believe the behavior of this example is unspecified. The mbrtowc() function provides a superior migration path, so we are leaving this alone.