lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Boost non stemmed keywords (KStem filter)
Date Thu, 19 Nov 2015 23:36:09 GMT
That is the approach I’ve been using for years. Simple and effective.

It probably makes the index bigger. Make sure that only one of the fields is stored, because
the stored text will be exactly the same in both.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Nov 19, 2015, at 1:47 PM, Ahmet Arslan <iorixxx@yahoo.com.INVALID> wrote:
> 
> Hi,
> 
> I wonder about using two fields (text_stem and text_no_stem) and applying query time
boost
> text_stem^0.3 text_no_stem^0.6
> 
> What is the advantage of keyword repeat/paylad approach compared with this one?
> 
> Ahmet
> 
> 
> On Thursday, November 19, 2015 10:24 PM, Markus Jelsma <markus.jelsma@openindex.io>
wrote:
> Hello Jan - i have no code i can show but we are using it to power our search servers.
You are correct, you need to deal with payloads at query time as well. This means you need
a custom similarity but also customize your query parser to rewrite queries to payload supported
types. This is also not very hard, some ancient examples can still be found on the web. But
you also need to copy over existing TokenFilters to emit payloads whenever you want. Overriding
TokenFilters is usually impossible due to crazy private members (i still cannot figure out
why so many parts are private..)
> 
> It can be very powerful, especially if you do not use payloads to contain just a score.
But instead to carry a WORD_TYPE, such as stemmed, unstemmed but also stopwords, acronyms,
compound and subwords, headings or normal text but also NER types (which we don't have yet).
For this to work you just need to treat the payload as a bitset for different types so you
can have really tuneable scoring at query time via your similarity. Unfortunately, payloads
can only carry a relative small amount of bits :)
> 
> M.
> 
> -----Original message-----
>> From:Jan Høydahl <jan.asf@cominvent.com>
>> Sent: Thursday 19th November 2015 14:30
>> To: solr-user@lucene.apache.org
>> Subject: Re: Boost non stemmed keywords (KStem filter)
>> 
>> Do you have a concept code for this? Don’t you also have to hack your query parser,
e.g. dismax, to use other Query objects supporting payloads?
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 
>>> 18. nov. 2015 kl. 22.24 skrev Markus Jelsma <markus.jelsma@openindex.io>:
>>> 
>>> Hi - easiest approach is to use KeywordRepeatFilter and RemoveDuplicatesTokenFilter.
This creates a slightly higher IDF for unstemmed words which might be just enough in your
case. We found it not to be enough, so we also attach payloads to signify stemmed words amongst
others. This allows you to decrease score for stemmed words at query time via your similarity
impl.
>>> 
>>> M.
>>> 
>>> 
>>> 
>>> -----Original message-----
>>>> From:bbarani <bbarani@gmail.com>
>>>> Sent: Wednesday 18th November 2015 22:07
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Boost non stemmed keywords (KStem filter)
>>>> 
>>>> Hi,
>>>> 
>>>> I am using KStem factory for stemming. This stemmer converts 'france to
>>>> french', 'chinese to china' etc.. I am good with this stemming but I am
>>>> trying to boost the results that contain the original term compared to the
>>>> stemmed terms. Is this possible?
>>>> 
>>>> Thanks,
>>>> Learner
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context: http://lucene.472066.n3.nabble.com/Boost-non-stemmed-keywords-KStem-filter-tp4240880.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message