lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: How to apply relevant Stemmer to each document
Date Thu, 22 Dec 2011 23:46:01 GMT
Sure, but what about inappropriate stemming in one language that
happens to match something in another?

In general, putting multiple languages into a single field usually
only makes sense when the
overwhelming number of documents are in one language...

Best
Erick

On Thu, Dec 22, 2011 at 2:41 PM,  <alxsss@aim.com> wrote:
> Hi Erick,
>
> Why querying would be wrong?
>
> It is my understanding that if I have let say 3 docs and each of them has been indexed
with its own language stemmer, then sending a query will search  all  docs and return matching
results? Let say if a query is "driving" and one of the docs has drive and was stemmed by
English Stemmer, then it would return 1 result as opposed if I had applied to all docs Russian
lang stemmer and resuilt be 0 docs?
>
> Am I missing something?
>
> Thanks.
> Alex.
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Erick Erickson <erickerickson@gmail.com>
> To: solr-user <solr-user@lucene.apache.org>
> Sent: Thu, Dec 22, 2011 11:06 am
> Subject: Re: How to apply relevant Stemmer to each document
>
>
> Not really. And it's hard to make sense of how this would work in practice
> because stemming the document (even if you could) because that's only
> half the battle.
>
> How would querying work then? No matter what language you used
> for your stemming, it would be wrong for all the documents that used a
> different stemmer (or a stemmer based on a different language).
>
> So I wouldn't hold out too much hope here.
>
> Best
> Erick
>
> On Wed, Dec 21, 2011 at 4:09 PM,  <alxsss@aim.com> wrote:
>> Hello,
>>
>> I would like to know if in the latest version of solr is it possible to apply
> relevant stemmer to each doc depending on its lang field.
>> I searched solr-user mailing lists and fount this thread
>>
>> http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-td3235341.html
>>
>> but not sure if it was developed into a jira ticket.
>>
>> Thanks.
>> Alex.
>>
>>
>
>

Mime
View raw message