SC22/WG20 N780



Language codes: report to ISO/IEC JTC1/SC22/WG20.

John Clews


Overview

There is a certain amount of incompatibility in relation to standards
for language coding. I would recommend that JTC1/SC22/WG20 members
look at Peter Constable's recent well-argued paper at the
International Unicode Conference for further clarification of the
issues. Hopefully information on accessing this paper will be passed
to the JTC1/SC22/WG20 convenor shortly, for distribution.

In addition, the actual ISO standards process seems not to be able to
deliver the amount of codes that many IT vendors will require in a
globalised market.

The report below also looks at some areas of incompatibility that
might impact on JTC1/SC22/WG20 standards.


1. ISO/TC37/SC2/WG1 (Language Codes)

I attended ISO/TC37/SC2/WG1 (Language Codes) in London, and its
parent SC ISO/TC37/SC2 (Coding Systems). Its convenor, and the
project leader (i.e. project editor) for ISO 639-1 was extremely
apologetic that the time for this meeting was limited to 90 minutes,
despite the importance of ISO 639-1 - which (following national body
votes and comments, in a postal ballot and at the ISO/TC37/SC2/WG1
meeting) would replace ISO 639 (which is all that RFC 1766 Language
Tags refers to normatively).

Voting on the DIS ensures that ISO DIS 639-1 automatically will
become a standard.

That is, the 2-letter codes in ISO 639-1 will replace the 2-letter
codes in ISO 639. It is ISO 639 that is refered to normatively in
some ISO/IEC JTC1/SC22/WG20 standards.

ISO/IEC JTC1/SC22/WG20 will need to consider whether its standards
need to be updated in this regard. ISO/IEC JTC1/SC22/WG20 will also
need to consider whether its standards need to allow the inclusion of
3-letter codes from ISO 639-2.


2. RFC 1766: Language Tags

There is currently a review of RFC 1766: Language Tags. There is too
little flow of information between the review group and ISO/TC37/SC2,
which is responsible for Language Codes.

This may lead to versioning problems between ISO 639 and its
successor, and RFC 1766 and its successor, with a few "loose"
ambivalent codes (Hawai'ian is one example) that could impact on
IT systems and be difficult to work out where these ambivalences are,
for IT end-users who are not active in ISO/TC37/SC2/WG1 or the
ISO 639 Joint Advisory Committee.

ISO/IEC JTC1/SC22/WG20 needs to keep an eye on potential problems
here, and if necessary to suggest delays in some aspects of the
RFC 1766 development process if there is any danger of versioning.


3. ISO/TC37/SC2 (Coding Systems)

I also attended the parent SC ISO/TC37/SC2 (Coding Systems).
Aat Vervoorn, the current SC chair, stepped down. The good news in
relation to ISO/TC37/SC2 is that Gerhard Budin (Austria) is the
stongest candidate to replace him, and is very aware of IT issues,
and language codes in IT standards, being involved in various
projects funded by the European Commission (notably the current SALT
project) and sometimes in CEN/TC304: Information and Communications
Technologies: European Localization Requirements.

However, the bad news is that the vagaries of ISO voting could mean
that one of the other candidates is elected.


4. ISO 639-3

Gerhard Budin also proposed a NWI provisionally known as "ISO 639
part 3" which ISO CS still has to decide whether it eceives that
number or another number. "ISO 639 part 3" goes towards providing
codes for more language entities (a criticism which has been leveled
against ISO 639 and ISO 639 part 2), and also providing a more
structured mechanism for combining language codes with other codes
(country codes from ISO 3166 (and potentially from ISO 3166 part 2)
and script codes from the draft ISO 15924) than is currently provided
by any part of ISO 639 or RFC 1766.

It would aim to overcome the "versioning problem" between ISO 639 and
RFC 1766 and their successors. However, there is (yet again) a chance
of ISO 639 developments and RFC 1766 developments (or strictly
speaking the development of the successor to RFC 1766) getting out of
step with each other through versioning problems if the IRTF group
sets in stone too rigidly the successor to RFC 1766, and in my view
it may be better for that group to hold back on some areas.


5. ISO 639-2 (Language codes) and ISO/TC46/SC4/WG1

In the 3-letter codes in ISO 639-2 (bibliographic codes - not
restricted in practice to bibliographic use) there are restrictions
on which languages can get codes - a "number of (written) documents"
barrier has to be overcome, and for each language considered for
addition, there is supposed to be proof of 50 documents, even though
many languages in the standard itself fail to meet those criteria.

These are also considered for includion in the successor to RFC 1766:
only 2-letter codes are normatively refered to in RFC 1766 itself;
only 2-letter codes are normatively refered to in standards of
ISO/IEC JTC1/SC22/WG20.


6. The ISO 639 Joint Advisory Committee (Language Codes)

The ISO 639 Joint Advisory Committee seems to have replaced in
practice the moribund ISO 639 Joint Working Group.

The ISO 639 Joint Advisory Committee is also very slow, and fails to
meet IT needs. It did add several languages from the draft ISO 639-1
(ensuring some (but not total) compatibility between ISO 639-1 and
ISO 639-2) at its last meeting in February 2000. In addition to that
it approved only 6 codes at that meeting, and approved a further one
code - for Lower German/Lower Saxon (also used in the Netherlands,
and distinct from German or any of its dialects) - although it
decided on a new code, rather than using the existing 3-letter code
which is in use in UK and Swedish bibliographic standards.

The ISO 639 Joint Advisory Committee also ignored requests from the
UK for several additional languages: the convenor, despite several UK
requests, did not distribute the paper requesting these codes.

There has been some disarray in the ISO 639 Joint Advisory Committee
(JAC) as ISO procedures have not been followed, and some documents
submitted by JAC Observers were not distributed, and JAC membership
issues have not always been clear.

These were largely exacerbated by a very public email row/rant
between Marion Gunn and Michael Everson, despite both acting as
alternates for each other on the ISO 639 Joint Advisory Committee,
both representing Ireland on various ISO committees and both being
partners in the same company in Dublin.


7. IUC (International Unicode Conference)

At the IUC (International Unicode Conference) Peter Constable of the
Summer Institute of Linguistics (SIL) presented a paper which takes a
measured view of the various problems outlined above.

SIL has been approached by Unesco, and by IT industry representatives
in the past, to use the 3-letter Ethnologue codes (or SIL codes), in
various contexts and applications.

There has also been recent discussions on the iso639@dkuug.dk list on
these issues.

There seems more interest from the IT industry in these codes and/or
these entities than in the ISO 639 codes.

There are some issues of language definition in the SIL codes that
need to be addressed, but less than those in ISO 639 (which name but
which do not define or identifiy the language).

It seems likely that work on "ISO 639 part 3" could also relate to
this. Given the IT industry interest, and also the likely convergence
of de facto, RFC and ISO codes, it would be preferable in my view for
ISO/IEC JTC1/SC22/WG20 (and for IETF) to await further developments
in language coding in this area ("ISO 639 part 3" and SIL codes)
before updating anything relating to standards of ISO/IEC
JTC1/SC22/WG20.

I hope that Peter Constable's IUC paper can be distributed as a
ISO/IEC JTC1/SC22/WG20 paper.


John Clews

20 September 2000.


--
John Clews, SESAME Computer Projects, 8 Avenue Rd, Harrogate, HG2 7PG
tel: +44 1423 888 432; fax: + 44 1423 889061;
Email: Converse@sesame.demon.co.uk

Committee Chair of  ISO/TC46/SC2: Conversion of Written Languages;
Committee Member of ISO/IEC/JTC1/SC22/WG20: Internationalization;
Committee Member of ISO/TC37: Terminology;
Committee Member of the Foundation for Endangered Languages.

Page 1		C:\WINNT\Profiles\winkleaf\Application Data\Microsoft\Templates\Normal.dot