lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Which one is it "cs" or "cz" for Czech language?
Date Wed, 18 Mar 2015 12:37:54 GMT
It does indeed appear that use of the "_cz" suffix is a mistake - those
suffixes are supposed to be language codes. Sure, generally, there tends to
be a one-to-one relationship between language and country, but clearly that
is not as absolute as a casual observer might misguidedly think.

I think it's worth a Jira - text types should use language codes, not
country codes.

-- Jack Krupansky

On Tue, Mar 17, 2015 at 1:35 PM, Eduard Moraru <enygma2002@gmail.com> wrote:

> Hi,
>
> First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> all.
>
> We are using Solr's dynamic fields in our project (XWiki), and we have
> recently noticed a problem [1] with the Czech language.
>
> Basically, our mapping says something like this:
>
> <dynamicField name="*_cz" type="text_cz" indexed="true" stored="true"
> multiValued="true" />
>
> ...but at runtime, we ask for the language code "cs" (which is the ISO
> language code for Czech [2]) and it obviously fails (due to the mapping).
>
> Now, we can easily fix this on our end by fixing the mapping to
> name="*_cs",
> but what we are really wondering now is why does Lucene/Solr use "cz"
> (country code) instead of "cs" (language code) in both its "text_cz" field
> and its "stopwords_cz.txt" file?
>
> Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> Is it going to be fixed?
>
> Thanks,
> Eduard
>
> ----------
> [1] http://jira.xwiki.org/browse/XWIKI-11897
> [2] http://en.wikipedia.org/wiki/Czech_language
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message