lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Removing duplicate terms from query
Date Thu, 09 Feb 2017 13:15:42 GMT
How about a pattern replace char filter that checks for repeating groups? I'd probably not
the fastest option but should work right away. 
 
-----Original message-----
> From:Emir Arnautovic <emir.arnautovic@sematext.com>
> Sent: Thursday 9th February 2017 13:52
> To: solr-user@lucene.apache.org
> Subject: Re: Removing duplicate terms from query
> 
> Hi Ere,
> 
> I don't think that there is such filter. Implementing such filter would 
> require looking backward which violates streaming approach of token 
> filters and unpredictable memory usage.
> 
> I would do it as part of query preprocessor and not necessarily as part 
> of Solr.
> 
> HTH,
> Emir
> 
> 
> On 09.02.2017 12:24, Ere Maijala wrote:
> > Hi,
> >
> > I just noticed that while we use RemoveDuplicatesTokenFilter during 
> > query time, it will consider term positions and not really do anything 
> > e.g. if query is 'term term term'. As far as I can see the term 
> > positions make no difference in a simple non-phrase search. Is there a 
> > built-in way to deal with this? I know I can write a filter to do 
> > this, but I feel like this would be something quite basic to do for 
> > the query. And I don't think it's even anything too weird for normal 
> > users to do. Just consider e.g. searching for music by title:
> >
> > Hey, hey, hey ; Shivers of pleasure
> >
> > I also verified that at least according to debugQuery=true and 
> > anecdotal evicende the search really slows down if you repeat the same 
> > term enough.
> >
> > --Ere
> 
> -- 
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 

Mime
View raw message