lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <mike.kl...@gmail.com>
Subject Re: multi-language searching with Solr
Date Wed, 07 May 2008 22:50:47 GMT
I don't really see how that would help, no.  All the benefits from  
using separate indices would be gained by using one field per  
language, ISTM.

By the way, there are tools available that make field-per-language  
stuff much easier, especially if there are many fields.  By using  
dynamic fields, you don't have to explicitly declare all fields:

<dynamicField name="*_fr" type="text_fr" ... />

-Mike

On 7-May-08, at 12:46 PM, Gereon Steffens wrote:

> I have the same requirement, and from what I understand the  
> distributed search feature will help implementing this, by having  
> one shard per language. Am I right?
>
> Gereon
>
>
> Mike Klaas wrote:
>> On 5-May-08, at 1:28 PM, Eli K wrote:
>>> Wouldn't this impact both indexing and search performance and the  
>>> size
>>> of the index?
>>> It is also probable that I will have more then one free text fields
>>> later on and with at least 20 languages this approach does not seem
>>> very manageable.  Are there other options for making this work with
>>> stemming?
>> If you want stemming, then you have to execute one query per  
>> language anyway, since the stemming will be different in every  
>> language.
>> This is a fundamental requirement: you somehow need to track the  
>> language of every token if you want correct multi-language  
>> stemming.  The easiest way to do this would be to split each  
>> language into its own field.  But there are other options: you  
>> could prefix every indexed token with the language:
>> en:The en:quick en:brown en:fox en:jumped ...
>> fr:Le fr:brun fr:renard fr:vite fr:a fr:sauté ...
>> Separate fields seems easier to me, though.
>> -Mike
>
>


Mime
View raw message