.
Last update: 1997-05-20
9945-2-41
_____________________________________________________________________________
Topic: I18N issues - locales
Relevant Sections: 2.5.2.1, 2.8.3.2
Classification: Q1-6: Unaddressed Issues.
Q7: Ambiguous Issue.
Q8: No Change.
Defect Report:
-----------------------
(from Andrew Hume Doug McIlroy)
Issue B
[1]
The specification of locales and the interface to them
discriminates against non-vendor supplied software. In par-
ticular, it is impossible to write a portable implementation
of regcomp() and regexec(), as there is no standardised
interface to the vital knowledge presumably set up by a call
to setlocale(). This knowledge is detailed below; in brief,
the first seems an oversight and the others are necessary to
use the locale information.
________________________________________
[2] How can membership in class :blank: be determined
portably? [2.5.2.1, 2.8.3.2(6)]
Proposed Solution:
Provide a ctype function isblank().
Rationale:
It is inconsistent that this be the only LC_CTYPE cat-
egory without a C binding. Note that this extension intro-
duces a difference between the C and POSIX locales.
________________________________________
[3] How can the meaning of an arbitrary equivalence class
be discovered portably?
Proposed Solution:
Provide a function that, given any name for an equiva-
lence class, returns a list of names of collating symbols in
the class. The order of the list shall be the same regard-
less of what name is given.
Rationale:
This is needed if an application, such as a searching
or sorting tool, requires this locale-specific information.
In particular the regcomp() and sort need it.
________________________________________
[4] How can the meaning/value of an arbitrary collating
symbol be determined portably?
Proposed Solution:
Provide a function that, given a collating symbol,
returns the representation and length of the symbol.
Rationale:
This is needed if an application, such as a searching
or sorting tool, requires this locale-specific information.
________________________________________
[5] How can the collating elements in a string be found and
compared portably?
Proposed Solution:
Provide a function that returns the length and the
weight vector for the collating element at the beginning of
the string.
Rationale:
This is needed if an application, such as a searching
or sorting tool, requires this locale-specific information.
________________________________________
[6] How can regcomp() expand a range expression into a
list of collating elements portably?
Proposed Solution:
Provide a successor function that, given the name of a
collating element, returns the name of the collating element
with the next larger weight vector. For this purpose two
elements with the same weight vector compare in the order of
their equivalence listing.
Rationale:
This is needed if an application, such as a searching
or sorting tool, requires this locale-specific information.
It may further be useful to have a way to inquire whether a
locale contains any multicharacter collating elements.
________________________________________
[7] Lines 2918-20 say that an equivalence class expression
that names a collating element not in an equivalence
class shall be treated as a collating symbol. Does
this statement affect the meaning of ``collating sym-
bol'' in line 3306? Does it eliminate such equivalence
class expressions from consideration in lines 2943-5?
Proposed Solution:
Change 2918-2920 to say ``the expression shall be
understood as an equivalence class that contains only the
one collating element.''
We would actually prefer the admittance of singleton
equivalence classes in the definitions of 2.5.2.2.
Rationale:
This question affects the meaning of range expressions.
Lines 2918-20 could be construed as forcing [[=CE1=]-
[=CE2=]] to mean [[.CE1.]-[.CE2.]] in some cases, although
the former expression looks syntactically incorrect. The
preferred solution agrees with customary mathematical usage,
and clarifies the behavior of the equivalence-class function
proposed in [3] above.
________________________________________
[8] What if collation changes between regcomp() and
regexec()?
Proposed Solution:
The result is undefined.
Rationale:
For the common case of locales in which all collating
elements are single characters, regcomp() should be allowed
to compile character classes. At the same time, regexec()
should be allowed to handle multicharacter collating sym-
bols. The proposed resolution assures that both desiderata
are met.
WG15 response for 9945-2:1993
-----------------------------------
Q1 The standard does not speak to this issue and no conformance
distinction can be made between alternative implementations based on
this.
The standard does not require that an implementation conforming
to the standard be portable. Therefore, there is no requirement
that the functionality be specified by the standard.
Concerns are being forwarded to the sponsor.
Q2,Q3,Q4,Q5,Q6
The standard does not speak to these issues and no conformance
distinction can be made between alternative implementations based on
this.
Concerns are being forwarded to the sponsor.
Q7 The standard is unclear on this issue, and as such no conformance
distinction can be made between alternative implementations based
on this. This is being referred to the sponsor.
Q8 The standard states the required behavior and
conforming implementations shall conform to this. According to
P.2 pg 729 line 367-368, the standard specifies the result is
undefined.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________