openoffice-l10n mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <robw...@apache.org>
Subject Re: Language codes ???
Date Mon, 18 Mar 2013 11:49:28 GMT
On Sat, Mar 16, 2013 at 5:51 AM, Andrea Pescetti <pescetti@apache.org> wrote:
> janI wrote:
>>
>> I have the following codes (directories):
>> af brx dz eu he ka ky my om ro ...
>>
>> Where  can I find the relation between the directory names and the
>> languages (human names), someone (I think andrea) mentioned it was country
>> codes ?
>
>
> We don't use country codes, we rely on the LANGUAGE codes, which are ISO
> standards. So, in general:
> - if it is a two-letter code, look it up in ISO 639-1:
> http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes  ("af" -> "Afrikaans")
> - if it is a three-letter code, use ISO 639-2 or (more complete, extends
> 639-2) 639-3: http://en.wikipedia.org/wiki/List_of_ISO_639-3_codes ("pap" ->
> "Papiamento")
>
>
>> I expected dialects within a language to be written as e.g. es_XX, and I
>> know there is an ongoing effort on translating to
>>     Catalan Euskadi and Gallego
>
>
> No, this would be a dangerous approach! There is a lot of "political
> correctness" at work here. Everything that is in ISO is a language. So all
> languages spoken in Spain have equal dignity and their own codes. Catalan is
> "ca", Basque/Euskadi is "eu", Gallego is "gl" and you listed all three of
> them.
>
>
>> I am also a bit puzzled about pt_BR and ca_XV
>
>
> These are extensions made to accommodate language variants. Languages in the
> form '[a-z]*_[A-Z]*' are an internal convention to be read as:
> language_PLACE. So en_US means "English, as spoken in the US"; en_GB =
> "English, as spoken in Great Britain"; pt_BR = "Portoguese, as spoken in
> Brazil"; ca_XV = "Catalan, as spoken in Valencia [or Comunidad Valenciana]".
> zh_CN and zh_TW are often called "simplified" and "traditional" Chinese,
> instead of being linked to China and Taiwan as the two codes would mean.
>

Do you know why we don't just follow the IETF's recommendations in
this area?  They have a similar scheme, BCP 47, but use a hyphen
rather than underscore, e.g., en-US, pt-BR.  This is what is used on
the web in general, e.g., in HTTP headers.

See:   http://www.rfc-editor.org/bcp/bcp47.txt

The even take it a step further, which might be useful in some cases.
For example:  sr-Latn-RS means Serbian language written in Latin
script, as used in Serbia.

-Rob



> Regards,
>   Andrea.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: l10n-help@openoffice.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: l10n-unsubscribe@openoffice.apache.org
For additional commands, e-mail: l10n-help@openoffice.apache.org


Mime
View raw message