.
Last update: 1997-05-20
9945-2-29
Class: No change
_____________________________________________________________________________
Topic: regular expressions
Relevant Sections: 2.8
Defect Report:
-----------------------
Please provide an interpretation of the following taken from
Section 2.8 of ISO/IEC 9945-2:1993.
I think I know what the specified behavior is for the
following cases, but maybe I've opened an interesting
question or two.
Given a locale in which "ch" is a multiple character
collating element that collates between "c" and "d", then
certainly
[[.ch.]] matches "ch".
This makes it pretty clear that
[^[.ch.]] doesn't match "ch" (and not even
just the "c").
Therefore, consistency argues that
[^c] matches "ch"
And, of course,
[c] doesn't match "ch" (and not even just the
"c").
If we're in agreement so far, then the simple rule is that
if the string to check against a bracket expression can be
taken as a multiple character collating element, then the
matching process must do so.
I'm pretty sure about the above. What I'm not so sure about
is the behavior for character classes. Take, for example,
[[:alpha:]]
when presented with "ch". The rationale for POSIX.2
confirms that ``character classes are not intended to
include collating elements''. However, there are still two
possible answers: "ch" doesn't match, and the "c" of "ch"
matches. I like neither of these answers; neither fits my
intuitive belief that "ch" should match as a unit. Even
worse, the nonportable
[a-z] *does* match the unit "ch"!
What is actually specified for [[:alpha:]] here?
WG15 response for 9945-2:1993
-----------------------------------
A character class expression is defined in section 2.8.3.2 of the
standard, as a set of characters belonging to a character class, as
defined in the LC_CTYPE category of the current locale. A range
expression is defined in the same section as a set of collating elements
that fall between two elements in the current collation sequence,
inclusive.
Thus, a collating element ch, which is not a character, would be matched
by the range expression [a-z], but not by the character class (set of
specific characters specified in the locale file) [:alpha:]. [:alpha:]
would match the 'c' and the 'h' individually, for the same reason that
the expression [c] matches the 'c' in ch, but not the collating element
ch.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________