lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Deciding whether to stem at query time
Date Mon, 23 Apr 2012 19:35:21 GMT
There is a third approach. Create two fields and always query both of them, with the exact
field given a higher weight. This works great and performs well.

It is what we did at Netflix and what I'm doing at Chegg.

wunder

On Apr 23, 2012, at 12:21 PM, Andrew Wagner wrote:

> So I just realized the other day that stemming basically happens at index
> time. If I'm understanding correctly, there's no way to allow a user to
> specify, at run time, whether to stem particular words or not based on a
> single index. I think there are two options, but I'd love to hear that I'm
> wrong:
> 
> 1.) Incrementally build up a white list of words that don't stem very well.
> To pick a random example out of the blue, "light" isn't super closely
> related to, "lighter", so I might choose not to stem that. If I wanted to
> do this, I think (if I understand correctly), stemmerOverrideFilter would
> help me out with this. I'm not a big fan of this approach.
> 
> 2.) Index all the text in two fields, once with stemming and once without.
> Then build some kind of option into the UI for specifying whether to stem
> the words or not, and search the appropriate field. Unfortunately, this
> would roughly double the size of my index, and probably affect query times
> too. Plus, the UI would probably suck.
> 
> Am I missing an option? Has anyone tried one of these approaches?
> 
> Thanks!
> Andrew






Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message