lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <md...@apache.org>
Subject Re: Which one is it "cs" or "cz" for Czech language?
Date Tue, 17 Mar 2015 17:45:57 GMT
Probably a historical artifact.

cz is the country code for the Czech Republic, cs is the language code for
Czech. Once, cs was also the country code for Czechosolvakia, leading some
folks to accidentally conflate the two.

On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru <enygma2002@gmail.com>
wrote:

> Hi,
>
> First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> all.
>
> We are using Solr's dynamic fields in our project (XWiki), and we have
> recently noticed a problem [1] with the Czech language.
>
> Basically, our mapping says something like this:
>
> <dynamicField name="*_cz" type="text_cz" indexed="true" stored="true"
> multiValued="true" />
>
> ...but at runtime, we ask for the language code "cs" (which is the ISO
> language code for Czech [2]) and it obviously fails (due to the mapping).
>
> Now, we can easily fix this on our end by fixing the mapping to
> name="*_cs",
> but what we are really wondering now is why does Lucene/Solr use "cz"
> (country code) instead of "cs" (language code) in both its "text_cz" field
> and its "stopwords_cz.txt" file?
>
> Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> Is it going to be fixed?
>
> Thanks,
> Eduard
>
> ----------
> [1] http://jira.xwiki.org/browse/XWIKI-11897
> [2] http://en.wikipedia.org/wiki/Czech_language
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message