lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven White <swhite4...@gmail.com>
Subject Re: Default stop word list
Date Tue, 30 Aug 2016 00:39:22 GMT
Thanks Shawn.  This is the best answer I have seen, much appreciated.

A follow up question, I want to remove stop words from the list, but if I
do, then search quality will degradation (and index size will grow (less of
an issue)).  For example, if I remove "a", then if someone search for "For
a Few Dollars More" (without quotes) chances are good records with "a" will
land higher up that are not relevant to user's search.  How can I address
this?  Can I setup my schema so that records that get hits against a list
of words, let's say off the stop word list, are ranked lower?

Steve

On Sat, Aug 27, 2016 at 2:53 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 8/27/2016 12:39 PM, Shawn Heisey wrote:
> > I personally think that stopword removal is more of a problem than a
> > solution.
>
> There actually is one thing that a stopword filter can dothat has little
> to do with the purpose it was designed for.  You can make it impossible
> to search for certain words.
>
> Imagine that your original data contains the word "frisbee" but for some
> reason you do not want anybody to be able to locate results using that
> word.  You can create a stopword list containing just "frisbee" and any
> other variations that you want to limit like "frisbees", then place it
> as a filter on the index side of your analysis.  With this in place,
> searching for those terms will retrieve zero results.
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message