lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: handling stopwords for special scenarios
Date Thu, 09 Apr 2020 15:28:40 GMT
Agreed, leave the stopwords alone. I ran into this same problem
thirteen years ago at Netflix. Even before that, I wasn’t removing 
stopwords, but I accidentally left them in the Solr 1.3 config.

https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Apr 9, 2020, at 7:34 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> 1> why use stopwords at all? They’re largely a holdover from the
>     bad old days when memory was limited. I usually recommend
>     people just start by not using stopwords at all.
> 
> 2> assuming <1> doesn’t work for you, why doesn’t it look feasible
>      to remove here from the stopword list? True, you have to re-index.
> 
> But what you’re asking for is not possible. Stopwords are simply gone
> from the index without a trace, there’s absolutely no way to reconstruct
> one.
> 
> I’d also argue that this is an N+1 situation. Sure, you’ll solve the “here”
> problem by removing it from the stopword list, but then you’ll have
> the same problem with “there”…
> 
> Best,
> Erick
> 
>> On Apr 9, 2020, at 9:10 AM, rashi gandhi <gandhirashi19@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> We are using stopword filter factory at both index and search time, to omit
>> the stopwords.
>> 
>> However, for a one particular case, we are getting "here" as a search query
>> and "here" is one the words in title/name representing our client.
>> We are returning zero results as "here" is one of the English
>> language stopwords which is getting omitted while indexing and searching
>> both.
>> 
>> One solution could be that I remove the "here" from list of stopwords,
>> however does not look feasible.
>> 
>> Is there any way where we can handle this kind of cases, where
>> stopwrods are meant to be actual search term?
>> 
>> Any leads would be appreciated.
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message