lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Deciding whether to stem at query time
Date Tue, 24 Apr 2012 14:16:25 GMT
Hi Andrew,

This would not necessarily increase the size of your index that much - you don't to store
both fields, just 1 of them if you really need it for highlighting or displaying.  If not,
just index.

Otis 
----
Performance Monitoring for Solr - http://sematext.com/spm/solr-performance-monitoring



>________________________________
> From: Andrew Wagner <wagner.andrew@gmail.com>
>To: solr-user@lucene.apache.org 
>Sent: Tuesday, April 24, 2012 7:21 AM
>Subject: Re: Deciding whether to stem at query time
> 
>Ah, this is a really good point. Still seems like it has the downsides of
>#2, though, much bigger space requirements and possibly some time lost on
>queries.
>
>On Mon, Apr 23, 2012 at 3:35 PM, Walter Underwood <wunder@wunderwood.org>wrote:
>
>> There is a third approach. Create two fields and always query both of
>> them, with the exact field given a higher weight. This works great and
>> performs well.
>>
>> It is what we did at Netflix and what I'm doing at Chegg.
>>
>> wunder
>>
>> On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:
>>
>> > So I just realized the other day that stemming basically happens at index
>> > time. If I'm understanding correctly, there's no way to allow a user to
>> > specify, at run time, whether to stem particular words or not based on a
>> > single index. I think there are two options, but I'd love to hear that
>> I'm
>> > wrong:
>> >
>> > 1.) Incrementally build up a white list of words that don't stem very
>> well.
>> > To pick a random example out of the blue, "light" isn't super closely
>> > related to, "lighter", so I might choose not to stem that. If I wanted to
>> > do this, I think (if I understand correctly), stemmerOverrideFilter would
>> > help me out with this. I'm not a big fan of this approach.
>> >
>> > 2.) Index all the text in two fields, once with stemming and once
>> without.
>> > Then build some kind of option into the UI for specifying whether to stem
>> > the words or not, and search the appropriate field. Unfortunately, this
>> > would roughly double the size of my index, and probably affect query
>> times
>> > too. Plus, the UI would probably suck.
>> >
>> > Am I missing an option? Has anyone tried one of these approaches?
>> >
>> > Thanks!
>> > Andrew
>>
>>
>>
>>
>>
>>
>
>
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message