lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gereon Steffens <ger...@steffens.org>
Subject Re: multi-language searching with Solr
Date Wed, 07 May 2008 19:46:20 GMT
I have the same requirement, and from what I understand the distributed 
search feature will help implementing this, by having one shard per 
language. Am I right?

Gereon


Mike Klaas wrote:
> On 5-May-08, at 1:28 PM, Eli K wrote:
> 
>> Wouldn't this impact both indexing and search performance and the size
>> of the index?
>> It is also probable that I will have more then one free text fields
>> later on and with at least 20 languages this approach does not seem
>> very manageable.  Are there other options for making this work with
>> stemming?
> 
> If you want stemming, then you have to execute one query per language 
> anyway, since the stemming will be different in every language.
> 
> This is a fundamental requirement: you somehow need to track the 
> language of every token if you want correct multi-language stemming.  
> The easiest way to do this would be to split each language into its own 
> field.  But there are other options: you could prefix every indexed 
> token with the language:
> 
> en:The en:quick en:brown en:fox en:jumped ...
> fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...
> 
> Separate fields seems easier to me, though.
> 
> -Mike
> 



Mime
View raw message