.
Last update: 1997-05-20
9945-2-9
Class: No change
_____________________________________________________________________________
Topic: LC_CTYPE
Relevant Sections: E.3.5.3
Defect Report:
-----------------------
In Section 3.5.3 - Variables, the standard states that:
This variable [LC_CTYPE] shall determine the
interpretation of sequences of bytes of text data
as characters (e.g. single- versus multibyte
characters), which characters are defined as
letters (character class alpha) and <blank>s
(character class blank), and the behaviour of
character classes within pattern matching.
Changing the value of LC_CTYPE after the shell has
started shall not affect the lexical processing of
shell commands in the current shell execution
environment or its subshells (see 3.12).
[Draft 12 of ISO/IEC 9945-2:1993 (July 1992), p. 128, lines
268-276]
The standard also states that the LANG variable ``shall
provide a default value for the LC_* variables,as described
in 2.6'' [Ibid., p. 128, line 261] and that the LC_ALL
variable ``shall interact with the LANG and LC_* variables
as described in 2.6.'' [Ibid., p. 128, line 264]
In Section 2.6 - Environment Variables, the standard
summarizes the meanings of these variables:
LANG This variable shall determine the
locale category for any category not
specifically selected via a variable
starting with LC_. LANG and the LC_
variables can be used by applications
to determine the language for messages
and instructions, collating sequences,
date formats, etc. Additional
semantics of this variable, if any,
are implementation defined.
LC_ALL This variable shall override the value
of the LANG variable and the value of
any of the other variables starting
with LC_.
[...]
LC_CTYPE This variable shall determine the
locale category for character handling
functions. This environment variable
shall determine the interpretation of
sequences of bytes of text data as
characters (e.g. single- versus
multibyte characters), the
classification of characters (e.g.
alpha, digit, graph), and the
behaviour of character classes.
Additional semantics of this variable,
if any, are implementation defined.
[Ibid., pp. 76-77, lines 2635-2658]
Does changing LC_ALL (or LANG if LC_CTYPE is not set) affect
the lexical processing of shell commands in the current
shell execution environment? Is the intent of the standard
that any changes to environment variables that cause a new
LC_TYPE to be used shall be ignored by the shell once it has
started execution?
An implementation of sh must use the locale specified in
LC_CTYPE when reading a script. For example,
isalpha/isalnum is used to parse variable names.
Consider this simple command:
FO<O-umlaut>=BAR cmd
If isalnum('<O-umlaut>'), then this will parse as a variable
assignment, otherwise it is argument 0. Similarly, cmd will
be subject to alias expansion in the former case. There is
no need to validate variable names at other times. In such
an implementation, changing LC_CTYPE causes no problems.
What are the problems with the following commands:
LANG=locale-with-O-umlaut
FO<O-umlaut>=BAR
Then consider this sequence of commands:
[ -n "$FO<O-umlaut>" ] && alias echo=:
echo foo
In both cases the parsing of the second line is determined
by the execution of the first line. Traditional
implementations execute the first line, then parse and
execute the second line. What would a compiler do?
On the other hand, if they where embedded in {...} or any
other shell compound command, they would both be parsed
before being executed. So we have two cases where behaviour
is poorly defined or context dependent.
I suggest the behaviour of setting the LC_CTYPE be made
undefined. Changing LANG in an interactive shell is a
reasonable thing to do, and an implementation may
immediately change all locales with no problems. Having all
but one locale change, and just in the shell, is unintuitive
and not required.
WG15 response for 9945-2:1993
-----------------------------------
The standard clearly states that changes to LC_CTYPE shall
not take effect within the current shell execution
environment. This is discussed in the rationale in Section
E.3.5.3.
Rationale for Interpretation:
-----------------------------
None.
_____________________________________________________________________________