lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <a...@flax.co.uk>
Subject Re: Queries for many terms
Date Tue, 03 Nov 2015 10:15:33 GMT
TermsQuery works by pulling the postings lists for each term and OR-ing them together to create
a bitset, which is very memory-efficient but means that you don't know at doc collection time
which term has actually matched.

For your case you probably want to create a SpanOrQuery, and then iterate through the resulting
Spans in a specialised Collector.  Depending on how many terms you want, though, you may end
up requiring a lot of memory for the search.

Alan Woodward
www.flax.co.uk


On 2 Nov 2015, at 17:14, Upayavira wrote:

> I have a scenario where I want to search for documents that contain many
> terms (maybe 100s or 1000s), and then know the number of terms that
> matched. I'm happy to implement this as a query object/parser.
> 
> I understand that Lucene isn't well suited to this scenario. Any
> suggestions as to how to make this more efficient? Does the TermsQuery
> work differently from the BooleanQuery regarding large numbers of terms?
> 
> Upayavira


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message