openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk-Willem van Gulik <>
Subject Re: Two Languages: One ISO-639-# code
Date Tue, 19 May 2015 10:20:41 GMT

> On 19 May 2015, at 11:52, toki <> wrote:
> On 19/05/2015 08:05, Dirk-Willem van Gulik wrote:
>>> In testing out various grammar and spell checkers, I've come across a
>>> couple of instances, where different languages/dialects share the same
>>> ISO-639-# code.
>> Can you give an example
> ISO 639-1 is xo
> ISO 639-2 is xho
> ISO 639-3 is xho
> Glotolog is xhos1239
> ISO 3166-1 ZA / ZAF / 710
> ISO 3166-2 ZA-EC
> and
> ISO 639-1 is xo
> ISO 639-2 is xho
> ISO 639-3 is xho
> Glotolog is mpon1252
> ISO 3166-1 ZA / ZAF / 710
> ISO 3166-2 ZA-NL
> (Please skip the debate about whether or not the enclaves are KwaZulu,
> the Eastern Cape, or Lesotho.)

Ok - good examples. So the 639’s all map maps to

2 map to the actual language in current use; 1 maps to the language families and group that
xho and its dialects, like mpondo, belong to.


	-1 (xh equivalent)
	-2 and -3: (xho)


so the -1, -2 and -3 are equivalent. And 1:1 on xhos1239 in glotolog ? And -5 is a white herring
- it maps to the language families and group of xho languages.

Now as far as I can see - mpon1252 is a dialect (Mpondo) within xhos1239.

It has no entry of its own in -3 or within -5; so its closed is xo/xho/xho in -1, -2, -3;
and it for sure belngs in -5 xho.

Or in otherwords;  (or the US library of congress for -5) has not assigned it (yet).

So in ISO 639-X the most accurate you can pinpoint it is xo and then xho.

And in glotolog; you have mpon1252 as its most precise denominator.

Now as it *happens* - this language is spoken in an area fully covered by a single country
- so you can use a 3166 as a country (-1, ZA) or (-2, ZA-EC, ZA-NL) region specifier; and
then refine it. As it happens that the region more or less maps to the language spoken there
(and lets argue that in that region or country no other languages are spoken).

> For a slightly different example, I give you Koine Greek and Attic Greek
> .
> Linguist-List codes them as grc-koi & grc-att, respectively.
> ISO 639-2 code is GRC. ISO 639-3 is GRC. No ISO 639-1 code.
> I wish all dialects/languages were as accommodating as:
> Gottolog lush1251
> ISO 639-1 none;
> ISO 639-2 none;
> ISO 639-3 LUT;
> ISO 639-3 SKA;
> ISO 639-3 SNO;
> ISO 639-3 SLH;
> (Note: AFAIK, there are no spell checkers or grammar checkers for those
> dialects, for any office suite.)

So also good examples - and I think the same applies

-	you get broad specifiers on -1, -2 level.
-	you may get granular specifiers in -3 and -5 for the rarer/older languages.
-	for dialects and more refined pinpointing you hit the limits of 639(-5) and have
	two options; petition SIL/Library of Congress to add one (above examples are all in scope);
or rely on glottolog.


-	using regional coding; 3166; is not really helping you - as they do not define language.

Pragmatically that means using an exact -3 if you have it (i.e. the exact language match);
relying on the nearest ‘above’ -5 language family identifier when there is no -3 match
to be had; and ONLY in the -5 case add whatever you can, e.g. the glottolog identifier, to
refine it.

And because -3 and -5 use similar identifiers for languages actually spoken (xho) and the
language group (xho) to which mpo belongs; the identifier you expose should propably be something

	iso-639-3:lang			lang = alpha-3 language identifier
						langgroup = alpha-3 language families and groups identifier
						other = optional identifier; taken from glottlog when available.

or something along those lines. And discourage -1 and 3166 use; though permit it in :other
if there is no glottolog entry


View raw message