openoffice-l10n mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <>
Subject Re: Language codes ???
Date Mon, 18 Mar 2013 11:49:28 GMT
On Sat, Mar 16, 2013 at 5:51 AM, Andrea Pescetti <> wrote:
> janI wrote:
>> I have the following codes (directories):
>> af brx dz eu he ka ky my om ro ...
>> Where  can I find the relation between the directory names and the
>> languages (human names), someone (I think andrea) mentioned it was country
>> codes ?
> We don't use country codes, we rely on the LANGUAGE codes, which are ISO
> standards. So, in general:
> - if it is a two-letter code, look it up in ISO 639-1:
>  ("af" -> "Afrikaans")
> - if it is a three-letter code, use ISO 639-2 or (more complete, extends
> 639-2) 639-3: ("pap" ->
> "Papiamento")
>> I expected dialects within a language to be written as e.g. es_XX, and I
>> know there is an ongoing effort on translating to
>>     Catalan Euskadi and Gallego
> No, this would be a dangerous approach! There is a lot of "political
> correctness" at work here. Everything that is in ISO is a language. So all
> languages spoken in Spain have equal dignity and their own codes. Catalan is
> "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three of
> them.
>> I am also a bit puzzled about pt_BR and ca_XV
> These are extensions made to accommodate language variants. Languages in the
> form '[a-z]*_[A-Z]*' are an internal convention to be read as:
> language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
> "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
> Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad Valenciana]".
> zh_CN and zh_TW are often called "simplified" and "traditional" Chinese,
> instead of being linked to China and Taiwan as the two codes would mean.

Do you know why we don't just follow the IETF's recommendations in
this area?  They have a similar scheme, BCP 47, but use a hyphen
rather than underscore, e.g., en-US, pt-BR.  This is what is used on
the web in general, e.g., in HTTP headers.


The even take it a step further, which might be useful in some cases.
For example:  sr-Latn-RS means Serbian language written in Latin
script, as used in Serbia.


> Regards,
>   Andrea.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message