lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eduard Moraru <enygma2...@gmail.com>
Subject Re: Which one is it "cs" or "cz" for Czech language?
Date Wed, 18 Mar 2015 09:17:21 GMT
Hi,

On Wed, Mar 18, 2015 at 9:28 AM, steve <sc_shepard@hotmail.com> wrote:

> FYI:http://www.w3schools.com/tags/ref_country_codes.asp CZECH REPUBLICCZ
> No entry for CS
>

Exactly, steve. "CZ" is the country code, however we are talking about
language codes (which is "CS"), since those Solr types deal with languages
not with countries.

Or were you trying to point out something else?

Thanks,
Eduard

P.S: Here's the 2-letter language codes ISO for reference:
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

> From: mdrob@apache.org
> > Date: Tue, 17 Mar 2015 12:45:57 -0500
> > Subject: Re: Which one is it "cs" or "cz" for Czech language?
> > To: solr-user@lucene.apache.org
> >
> > Probably a historical artifact.
> >
> > cz is the country code for the Czech Republic, cs is the language code
> for
> > Czech. Once, cs was also the country code for Czechosolvakia, leading
> some
> > folks to accidentally conflate the two.
> >
> > On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru <enygma2002@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > First of all, a bit of a disclaimer: I am not a Czech language
> speaker, at
> > > all.
> > >
> > > We are using Solr's dynamic fields in our project (XWiki), and we have
> > > recently noticed a problem [1] with the Czech language.
> > >
> > > Basically, our mapping says something like this:
> > >
> > > <dynamicField name="*_cz" type="text_cz" indexed="true" stored="true"
> > > multiValued="true" />
> > >
> > > ...but at runtime, we ask for the language code "cs" (which is the ISO
> > > language code for Czech [2]) and it obviously fails (due to the
> mapping).
> > >
> > > Now, we can easily fix this on our end by fixing the mapping to
> > > name="*_cs",
> > > but what we are really wondering now is why does Lucene/Solr use "cz"
> > > (country code) instead of "cs" (language code) in both its "text_cz"
> field
> > > and its "stopwords_cz.txt" file?
> > >
> > > Is that a mistake on the Solr/Lucene side? Is it some kind of
> convention?
> > > Is it going to be fixed?
> > >
> > > Thanks,
> > > Eduard
> > >
> > > ----------
> > > [1] http://jira.xwiki.org/browse/XWIKI-11897
> > > [2] http://en.wikipedia.org/wiki/Czech_language
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message