lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Language Identification and Stemming
Date Fri, 01 Mar 2013 22:00:56 GMT
Hi,

Q1. You use langid for the detection, and your chosen field(s) can be mapped to
new names such as title->title_en or title_de. Thus you need to configure
your schema with a separate fieldType for every language you want to support
if you'd like to use language specific stemming and stopwords etc.

Q2. You setup update.chain in your request handler and that's it.
It is not possible to return to the client the detected language or any
other response from the UpdateProcessors. You'll need to fetch the indexed
document.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

1. mars 2013 kl. 21:49 skrev "Vinay B," <vybe3142@gmail.com>:

> As I understand, SOLR allows us to plug in language detection
> processors: http://wiki.apache.org/solr/LanguageDetection
> 
> GIven that our use case involves a collection of mixed language documents,
> Q1: Assume that we plug in language detection, will this affect the
> stemming and other language specific operations eg. will the stemmers
> use the correct language identified by the language detection code:
> http://www.early-dance.de/news/9188-optimizing-apachesolr-non-english-languages
> Q2. Currently, we don't explicitly use a processor chain  for our
> updates, .. just a custom update handler that also returns custom
> opcodes etc in the response. If we plug  language detection via an
> update chain connected to this request handler, (how) can we pass the
> chosen language back via the response?
> 
>    <requestHandler name="/update/myupdatet"
>                  class="com.xyz.MyDocUpdateHandler" />
> 
> Thanks


Mime
View raw message