JTC1/SC22/WG14
N780
Document number: SC22/WG14 N780 (J11 97-144)
Title: POSIX Alignment
Author: Keld Simonsen
Author affiliation: DKUUG
Postal address: Fruebjergvej 3, DK-2100 Kbenhavn
Email address: keld@dkuug.dk
Telephone number: +45 3122-6543
Fax number: +45 3325-6543
Sponsor: DS
Date: 1997-09-28
Proposal category:
__ Editorial change/non-normative contribution
XX Correction
XX New feature
__ Addition to obsolescent feature list
__ Other (please specify)
Area of standard affected:
XX Environment
XX Language
__ Preprocessor
XX Library
XX Macro/typedef/tag name
XX Function
XX Header
__ Other (please specify)
Prior art: ISO/IEC 9945 POSIX standards
Target audience: general
Related documents: N431 (Rationale and analysis), N507, N538,
N586, N658, N665
Proposal attached: proposal paper
Review committee: Keld Simonsen, Rex Jaeschke, Doug Gwyn,
Frank Farance, Clive Feather
Status: stage 3, principally agreed
Abstract:
The paper gives proposals for alignment of C9X with the POSIX standards
wrt. internationalization features.
Introduction
This paper details changes to the C standard to align it with POSIX System
API (C language) (POSIX-1) and ISO/IEC 9945-2:1993 POSIX Shell and
Utilities (POSIX-2). It does not cover newer proposals for POSIX or other
related specifications, that are not yet international standards.
This document builds on N431, which gave an overview of
internationalisation in C and POSIX standards, a comparison of the
functionality and features provided, and also mentioned other
incompatibilities between C and POSIX standards. Thus N431 gave the
background and rationale for the proposed changes, and it was decided in
the Copenhagen meeting to do further work based on N431. The paper here
describes in detail what those changes should be.
7.3.1 Character testing functions
POSIX-2 adds in its section 2.5.2 a class "blank", consisting initially of
the characters <space> and <tab>. We should support this class be adding
the function isblank() (as well as iswblank) that is similar to the
isspace() function except that the test is for a standard blank character,
and the characters covered initially only are space (' ') and horizontal
tab ('\t').
-------- Start New Section ----------
7.3.1.3 The isblank function
Synopsis
[1]
#include <ctype.h>
int isblank(int c);
Description:
[2] The isblank function tests for any character that is a standard blank
character or is one of an implementation-defined set of characters, for
which isalnum is false. The standard blank characters are the
following: space (' '), and horizontal tab ('\t'). In the "C" locale,
isblank returns true only for the standard blank characters.
-------- End New Section ----------
-------- Start New Section ----------
7.17.2.1.3 The iswblank function
Synopsis
[1]
#include <wctype.h>
int iswblank(wint_t c);
Description:
[2] The iswblank function tests for any wide character that is a standard
blank wide character or is one of an implementation-defined set of wide
characters, for which iswalnum is false. The standard
blank wide characters are the following: space (L' '), and horizontal tab
(L'\t'). In the "C" locale, iswblank returns true only for the standard
blank wide characters.
-------- End New Section ----------
POSIX has this table in the standard, so it is proposed to add
this table at the end of clause 7.3
-------- Start New Section ----------
Table 1: Valid Character Class Combinations
In Can also belong to
Class upper lower alpha digit space cntrl punct graph print xdigit blank
upper + + A x x x x A A + x
lower + + A x x x x A A + x
alpha + + + x x x x A A + x
digit x x x + x x x A A A x
space x x x x + + * * * x +
cntrl x x x x + + x x x x +
punct x x x x + x + A A x +
graph + + + + + x + + A + +
print + + + + + x + + + + +
xdigit + + + + x x x A A + x
blank x x x x A + * * * x +
NOTES:
Note 1: Explanation of codes:
A Automatically included; see text
+ Permitted
x Mutually exclusive
* See note 2
Note 2: The <space> character, which is part of the space and blank class,
cannot belong to punct or graph, but automatically shall belong to the
print class. Other space or blank characters can be classified as punct,
graph, and/or print.
-------- End New Section ----------
7.3.2 Character case mapping functions
C has only an implicit statement on locale dependence for the case mapping
functions, referring to isupper/islower. The locale dependence can be made
explicit by adding text to the descriptions of to[w]upper() and
to[w]lower(), as follows:
-------- Start Changed Section ----------
7.3.2.1 The tolower function
Returns
[#3] If the argument is a character for which isupper is true and there is
a corresponding character ===>as specified by the current locale<=== for
which islower is true, the tolower function returns the corresponding
character; otherwise, the argument is returned unchanged.
-------- End Changed Section ----------
-------- Start Changed Section ----------
7.3.2.2 The toupper function
Returns
[#3] If the argument is a character for which islower is true and there is
a corresponding character ===>as specified by the current locale<=== for
which isupper is true, the toupper function returns the corresponding
character; otherwise, the argument is returned unchanged.
-------- End Changed Section ----------
-------- Start Changed Section ----------
7.17.3.1.1 The towlower function
Returns
[#3] If the argument is a wide character for which iswupper is true and
there is a corresponding wide character ===>as specified by the current
locale<=== for which iswlower is true, the towlower function returns the
corresponding wide character; otherwise, the argument is returned
unchanged.
-------- End Changed Section ----------
-------- Start Changed Section ----------
7.17.3.1.2 The towupper function
Returns
[#3] If the argument is a wide character for which iswlower is true and
there is a corresponding wide character ===>as specified by the current
locale<=== for which iswupper is true, the towupper function returns the
corresponding wide character; otherwise, the argument is returned
unchanged.
-------- End Changed Section ----------
7.4 Localization
The POSIX-2 standard was approved after adoption of the C standard, and it
contains a format for specifying locales and accompanying charmaps. This
is a valuable and standardized way of specifying locales and should be
mentioned as a footnote, as follows:
-------- Start Changed Section ----------
7.5 Localization <locale.h>
[#3] The macros defined are NULL (described in 7.1.6); and
LC_ALL
LC_ALL *Footnote*
LC_COLLATE
...
Footnote: POSIX-2 specifies locale and charmap formats that may be used to
specify locales for C.
-------- End Changed Section ----------
A reference to the POSIX-2 standard should be added to the informative
bibliography.
The entry is:
ISO/IEC 9945-2:1993 Information technology - Portable Operating System
Interface(POSIX) - Part 2: Shell and Utilities.
7.5.2.1 int_curr_symbol different from currency_symbol
As there may be differences between the order of how local currency is
written and how international currency is written, it is proposed to add
the following members (none of which are part of the POSIX spec) to the
lconv struct, as follows:
-------- Start Changed Section ----------
7.5 Localization <locale.h>
[#2] ...
char int_p_cs_precedes; /* CHAR_MAX */
char int_p_sep_by_space; /* CHAR_MAX */
char int_n_cs_precedes; /* CHAR_MAX */
char int_n_sep_by_space; /* CHAR_MAX */
char int_p_sign_posn; /* CHAR_MAX */
char int_n_sign_posn; /* CHAR_MAX */
-------- End Changed Section ----------
-------- Start Changed Section ----------
7.5.2.1 The localeconv function
[#3] ...
char int_p_cs_precedes Set to 1 or 0 if the int_curr_symbol respectively
precedes or succeeds the value for a nonnegative formatted monetary
quantity.
char int_p_sep_by_space Set to 1 or 0 if the int_curr_symbol respectively
is or is not separated by a space from the value for a nonnegative
formatted monetary quantity.
char int_n_cs_precedes Set to 1 or 0 if the int_curr_symbol respectively
precedes or succeeds the value for a negative formatted monetary quantity.
char int_n_sep_by_space Set to 1 or 0 if the int_curr_symbol respectively
is or is not separated by a space from the value for a negative formatted
monetary quantity.
char int_p_sign_posn Set to a value indicating the positioning of the
positive_sign for a nonnegative formatted monetary quantity.
char int_n_sign_posn Set to a value indicating the positioning of the
negative_sign for a negative formatted monetary quantity.
-------- End Changed Section ----------
In section 7.5.2.1 the examples need to be enhanced.
There cannot be a point after ITL.
Netherlands use a kind of small "f".
Norway have at least a space between "kr" and the value.
We need examples with all the new variables, int_p_cs_precedes etc.
This is all done in the text below.
-------- Start Changed Section ----------
7.5.2.1 The localeconv function
Examples
[#8] The following table illustrates the rules which may well be used by
five countries to format monetary quantities.
Country Positive format Negative format International format
Italy L.1.234 -L.1.234 ITL 1.234
Netherlands f 1.234,56 f -1.234,56 NLG 1.234,56
Norway kr 1.234,56 kr 1.234,56- NOK 1.234,56
Switzerland SFrs.1,234.56 SFrs.1,234.56C CHF 1,234.56
Finland 1.234,56 mk -1.234,56 mk FIM 1.234,56
[#9] For these five countries, the respective values for the monetary
members of the structure returned by localeconv are:
Italy Netherlands Norway Switzerland Finland
int_curr_symbol "ITL " "NLG " "NOK " "CHF " "FIM "
currency_symbol "L." "f" "kr" "SFrs." "mk"
mon_decimal_point "" "," "," "." ","
mon_thousands_sep "." "." "." "," "."
mon_grouping "\3" "\3" "\3" "\3" "\3"
positive_sign "" "" "" "" ""
negative_sign "-" "-" "-" "C" "-"
int_frac_digits 0 2 2 2 2
frac_digits 0 2 2 2 2
p_cs_precedes 1 1 1 1 0
p_sep_by_space 0 1 0 0 1
n_cs_precedes 1 1 1 1 0
n_sep_by_space 0 1 0 0 1
p_sign_posn 1 1 1 1 1
n_sign_posn 1 4 2 2 1
int_p_cs_precedes 1 1 1 1 1
int_p_sep_by_space 0 1 0 0 1
int_n_cs_precedes 1 1 1 1 1
int_n_sep_by_space 0 1 0 0 1
int_p_sign_posn 1 1 1 1 1
int_n_sign_posn 1 4 2 2 4
-------- End Changed Section ----------
7.4.2.1 p_sep_by_space and n_sep_by_space
POSIX has added a third possibility for a formatted monetary quantity, so
now we have:
No space separates the currency_symbol from the value.
A space separates the symbol from the value.
*New* A space separates the symbol and the value, if these entities are next
to eachother.
-------- Start Changed Section ----------
7.5.2.1 The localeconv function
[#3] ...
char p_sep_by_space Set to 0 if no space separates the currency_symbol
from the value for a nonnegative formatted monetary quantity; set to 1 if
a space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.
char n_sep_by_space Set to 0 if no space separates the currency_symbol
from the value for a negative formatted monetary quantity; set to 1 if a
space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.
char int_p_sep_by_space Set to 0 if no space separates the int_curr_symbol
from the value for a nonnegative formatted monetary quantity; set to 1 if
a space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.
char int_n_sep_by_space Set to 0 if no space separates the int_curr_symbol
from the value for a negative formatted monetary quantity; set to 1 if a
space separates the symbol from the value; and set to 2 if a space
separates the symbol and the value, if adjacent.
-------- End Changed Section ----------
---------------- added section in rationale -----------------
This section should go into the rationale
A table giving example formats for the combinations of p_cs_precedes,
p_sign_posn and p_sep_by_space is given below, given that the
positive_sign is "+" and the currency_symbol is "$".
p_sep_by_space
2 1 0
p_cs_precedes = 1 p_sign_posn = 0 ($ 1.25) ($ 1.25) ($1.25)
p_sign_posn = 1 + $1.25 +$ 1.25 +$1.25
p_sign_posn = 2 $1.25 + $ 1.25+ $1.25+
p_sign_posn = 3 + $1.25 +$ 1.25 +$1.25
p_sign_posn = 4 $ +1.25 $+ 1.25 $+1.25
p_cs_precedes = 0 p_sign_posn = 0 (1.25 $) (1.25 $) (1.25$)
p_sign_posn = 1 +1.25 $ +1.25 $ +1.25$
p_sign_posn = 2 1.25$ + 1.25 $+ 1.25$+
p_sign_posn = 3 1.25+ $ 1.25 +$ 1.25+$
p_sign_posn = 4 1.25$ + 1.25 $+ 1.25$+
------------------------------------------------------------------