JTC1/SC22/WG14
N895
Document: WG14 N895
WG14,
Japan has the following three items for the agenda of WG14 Kona meeting.
Please discuss about them during the meeting.
1) Rationale of 64-bit integer type
Japan needs to confirm whether the result of discussion at London
meeting is completely reflected at revised Rationale (especially
topic for elimination of long long overhead).
(Japan would like to know whether or not a draft written by Mr.Gwyn
after London meeting (see attached email SC22WG14.7328) is going to
be included into revised Rationale.)
2) Guarantee the behavior of extended conversion utilities even after
change of LC_CTYPE at specific condition
Japan wants to propose a correction about the behavior of the extended
multibyte and wide-character conversion utilities in 7.24.6.
Please discuss about this issue at WG14 meeting.
Current : The behavior of the extended multibyte and wide-character
conversion utilities in 7.24.6 after change of LC_CTYPE
is undefined.
Proposal: The behavior of the extended multibyte and wide-character
conversion utilities in 7.24.6 after change of LC_CTYPE
is guaranteed.
Details:
FDIS of C9X has the following specification:
> 7.24.6 Extended multibyte and wide-character conversion
> utilities
> [...]
> [#2] Most of the following functions -- those that are
> listed as ``restartable'', 7.24.6.3 and 7.24.6.4 -- take as
> a last argument a pointer to an object of type mbstate_t
> that is used to describe the current conversion state from a
> particular multibyte character sequence to a wide-character
> sequence (or the reverse) under the rules of a particular
> setting for the LC_CTYPE category of the current locale.
>
> [#3] The initial conversion state corresponds, for a
> conversion in either direction, to the beginning of a new
> multibyte character in the initial shift state. A zero-
> valued mbstate_t object is (at least) one way to describe an
> initial conversion state. A zero-valued mbstate_t object
> can be used to initiate conversion involving any multibyte
> character sequence, in any LC_CTYPE category setting. If an
> mbstate_t object has been altered by any of the functions
> described in this subclause, and is then used with a
> different multibyte character sequence, or in the other
> conversion direction, or with a different LC_CTYPE category
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> setting than on earlier function calls, the behavior is
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> undefined.290)
> ^^^^^^^^^
> 290Thus, a particular mbstate_t object can be used, for
> example, with both the mbrtowc and mbsrtowcs functions as
> long as they are used to step sequentially through the
> same multibyte character string.
The above specification is not Japan's original intention as
the author of the Multibyte Support Extension a.k.a Amendment1.
Japan's original intention about the behavior of the extended
multibyte and wide-character conversion utilities is described
in the Rationale of Amendment 1:
"Annex H9.2.3 Multiple encoding environment" in the Amendment 1:
> [...]
> The encoding rule information is effectively a part of the
> conversion state. Thus, the encoding information should be stored
> with the hidden mbstate_t object with the FILE object. (Some
> implementation may even choose to store the encoding rule as
> part of the value of an fpos_t object.) The conversion state
> just created when a file is opened is said to have *unbound*
> state because it has no relations to any of the encoding
> rules. Just after the first wide-character input/output operation,
> the conversion state is *bound* to the encoding rule which
> correspond to LC_CTYPE category of the current locale. The
> following is a summary of the relations between various objects,
> the shift state, and the encoding rules.
> fpos_t FILE
> shift state | included |included |
> encoding rule | maybe | included |
> changing LC_CTYPE
> (unbound) | no effect | affected |
> (bound) | no effect | no effect|
"Annex H.13.1 Conversion state" in the Amendment 1:
> To handle multiple strings with a state-dependent encoding, the
> committee introduced the concept of conversion state. The
> conversion state determines the behavior of a conversion between
> multibyte and wide-character encodings. For conversion from
> multibyte to wide character, the conversion state stores
> information such as the position within the current multibyte
> character(as a sequence of characters or a wide-character
> accumulator). And for conversions in either direction, the
> conversion state stores the current shift state (if any) and
> possible the encoding rule.
> [...]
Please consider the following program:
setlocale(A);
f1 = fopen("...", "r");
wc = fgetwc(f1);
setlocale(B);
f2 = fopen("...", "w");
while (wc != WEOF) {
fputwc(wc, f2);
wc = fgetrwc(f1);
}
According to the current C9X FDIS, the behavior of this program
is undefined. However, from the viewpoint of the design goal
of MSE as described in the rationale of Amendment1, it should be
well defined. Namely, the conversion state in bound state
should not be affected by changing LC_CTYPE in order to support
the multiple strings with a state-dependent encoding in multiple
encoding environment.
Please discuss and consider this issue at WG14 meeting.
If more detail is needed, please email to c.wg@nec.co.jp.
More technical discussion is welcome.
3) WG14 Tokyo meeting 2000/Apr
ITSCJ(Information Technology Standards Commission of Japan)
http://www.itscj.ipsj.or.jp/eg/index.html
will host the WG14 meeting which will be held on 2000-04-10/14.
The meeting place is located in KIKAI-SHINKO-KAIKAN Building:
3-5-8, Shiba-Koen, Minato-Ku, Tokyo 105-0011 Japan
One room(Room #68, cap. 18 people, on the 6th floor) for WG14 meeting
is already booked. (In Japan, 1st. floor is a ground floor.)
Traffic information, hotel and other accommodation information and so on
will be informed by ITSCJ or Makoto Noda to WG14 via email soon.
Thank you,
Makoto Noda, Chair of ITSCJ/SC22/C WG
--------------------------
> Date: Fri, 16 Jul 99 10:39:59 EDT
> From: "Douglas A. Gwyn (IST)" <gwyn@arl.mil>
> X-Sequence: SC22WG14@dkuug.dk 7328
> X-Errors-To: SC22WG14-request@dkuug.dk
> To: Randy Meyers <rmeyers@ix.netcom.com>
> Cc: sc22wg14 <sc22wg14@dkuug.dk>
> Subject: (c.wg 8975) (SC22WG14.7328) (SC22WG14.7294) Rationale for
elimination of long long overhead
>
> As a specific illustration of an implementation techgnique: the compiler
> can emit an external reference to a symbol ".i64used" (for example) if
> and only if the source code makes any use of a 64-bit integer type, and
> the C run-time object library can contain two versions of the _doprint
> module (which performs the actual work for the *printf family). The first
> version of printf would define the symbol ".i64used" and contain support
> for formatting 64-bit integers, while the second version would do neither.
> If the two versions of _dorpnt and the *printf modules are ordered
> properly in the library, then the linker would automatically include the
> 64-bit supporting version to satisfy the reference to ".i64used", before
> seeing any reference to _doprint from the *printf modules, and then the
> *printf modules would use the already-defined 64-bit version of _doprint.
> If there were no reference to ".i64used" (because the program made no
> use of any 64-bit integer type), then the linker would skip the first
> _doprnt module, would include *printf modules, then would include the
> non-64-bit _doprnt module in order to satisfy the referece to _doprnt
> from the *printf modules.
>