lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmet Arslan <>
Subject Re: Boost non stemmed keywords (KStem filter)
Date Thu, 19 Nov 2015 21:47:18 GMT

I wonder about using two fields (text_stem and text_no_stem) and applying query time boost
text_stem^0.3 text_no_stem^0.6

What is the advantage of keyword repeat/paylad approach compared with this one?


On Thursday, November 19, 2015 10:24 PM, Markus Jelsma <>
Hello Jan - i have no code i can show but we are using it to power our search servers. You
are correct, you need to deal with payloads at query time as well. This means you need a custom
similarity but also customize your query parser to rewrite queries to payload supported types.
This is also not very hard, some ancient examples can still be found on the web. But you also
need to copy over existing TokenFilters to emit payloads whenever you want. Overriding TokenFilters
is usually impossible due to crazy private members (i still cannot figure out why so many
parts are private..)

It can be very powerful, especially if you do not use payloads to contain just a score. But
instead to carry a WORD_TYPE, such as stemmed, unstemmed but also stopwords, acronyms, compound
and subwords, headings or normal text but also NER types (which we don't have yet). For this
to work you just need to treat the payload as a bitset for different types so you can have
really tuneable scoring at query time via your similarity. Unfortunately, payloads can only
carry a relative small amount of bits :)


-----Original message-----
> From:Jan Høydahl <>
> Sent: Thursday 19th November 2015 14:30
> To:
> Subject: Re: Boost non stemmed keywords (KStem filter)
> Do you have a concept code for this? Don’t you also have to hack your query parser,
e.g. dismax, to use other Query objects supporting payloads?
> --
> Jan Høydahl, search solution architect
> Cominvent AS -
> > 18. nov. 2015 kl. 22.24 skrev Markus Jelsma <>:
> > 
> > Hi - easiest approach is to use KeywordRepeatFilter and RemoveDuplicatesTokenFilter.
This creates a slightly higher IDF for unstemmed words which might be just enough in your
case. We found it not to be enough, so we also attach payloads to signify stemmed words amongst
others. This allows you to decrease score for stemmed words at query time via your similarity
> > 
> > M.
> > 
> > 
> > 
> > -----Original message-----
> >> From:bbarani <>
> >> Sent: Wednesday 18th November 2015 22:07
> >> To:
> >> Subject: Boost non stemmed keywords (KStem filter)
> >> 
> >> Hi,
> >> 
> >> I am using KStem factory for stemming. This stemmer converts 'france to
> >> french', 'chinese to china' etc.. I am good with this stemming but I am
> >> trying to boost the results that contain the original term compared to the
> >> stemmed terms. Is this possible?
> >> 
> >> Thanks,
> >> Learner
> >> 
> >> 
> >> 
> >> 
> >> --
> >> View this message in context:
> >> Sent from the Solr - User mailing list archive at
> >> 

View raw message