lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From steve <sc_shep...@hotmail.com>
Subject RE: Which one is it "cs" or "cz" for Czech language?
Date Wed, 18 Mar 2015 07:28:45 GMT
FYI:http://www.w3schools.com/tags/ref_country_codes.aspCZECH REPUBLICCZNo entry for CS
> From: mdrob@apache.org
> Date: Tue, 17 Mar 2015 12:45:57 -0500
> Subject: Re: Which one is it "cs" or "cz" for Czech language?
> To: solr-user@lucene.apache.org
> 
> Probably a historical artifact.
> 
> cz is the country code for the Czech Republic, cs is the language code for
> Czech. Once, cs was also the country code for Czechosolvakia, leading some
> folks to accidentally conflate the two.
> 
> On Tue, Mar 17, 2015 at 12:35 PM, Eduard Moraru <enygma2002@gmail.com>
> wrote:
> 
> > Hi,
> >
> > First of all, a bit of a disclaimer: I am not a Czech language speaker, at
> > all.
> >
> > We are using Solr's dynamic fields in our project (XWiki), and we have
> > recently noticed a problem [1] with the Czech language.
> >
> > Basically, our mapping says something like this:
> >
> > <dynamicField name="*_cz" type="text_cz" indexed="true" stored="true"
> > multiValued="true" />
> >
> > ...but at runtime, we ask for the language code "cs" (which is the ISO
> > language code for Czech [2]) and it obviously fails (due to the mapping).
> >
> > Now, we can easily fix this on our end by fixing the mapping to
> > name="*_cs",
> > but what we are really wondering now is why does Lucene/Solr use "cz"
> > (country code) instead of "cs" (language code) in both its "text_cz" field
> > and its "stopwords_cz.txt" file?
> >
> > Is that a mistake on the Solr/Lucene side? Is it some kind of convention?
> > Is it going to be fixed?
> >
> > Thanks,
> > Eduard
> >
> > ----------
> > [1] http://jira.xwiki.org/browse/XWIKI-11897
> > [2] http://en.wikipedia.org/wiki/Czech_language
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message