lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: How to limit SimpleCollector at N documents?
Date Fri, 18 Aug 2017 06:42:10 GMT
You could wrap a collector wrapper (have a look at FilterCollector maybe)
that throws a CollectionTerminatedException whenever more than X hits have
been collected in total. It will likely stop in the middle of the first
segment, and then before collecting further segments.

FYI you can not only throw a CollectionTerminatedException from the collect
method, but also from the getLeafCollector method, which allows to skip a
segment entirely before even starting to find a match.

We have such a collector in Elasticsearch, feel free to copy-paste it and
adapt to your needs if you want. It is licensed under ASL2:
https://github.com/elastic/elasticsearch/blob/36a5cf8f35e5cbaa1ff857b5a5db8c02edc1a187/core/src/main/java/org/elasticsearch/search/query/EarlyTerminatingCollector.java

Le jeu. 17 août 2017 à 21:46, Tod Olson <tod@uchicago.edu> a écrit :

> Hi everyone,
>
> I'm modifying an existing application, which uses a Lucene SimpleCollector
> to return document ids and some other fields from a search. For various
> reasons, we now want to place an upper bound on the number of documents
> actually collected.
>
> Is there a reasonable way to put a limit on the results returned by a
> SimpleCollector? Or do I need to change Collectors?
>
> Based on the docs, I could keep a counter and raise a
> CollectionTerminatedException after N documents, but then the search moves
> on to the next leaf. I'd like to have the entire search terminate and
> return the collected documents.
>
> Any assistance for a Lucene novice is greatly appreciated!
>
> -Tod
>
>
> Tod Olson <tod@uchicago.edu<mailto:tod@uchicago.edu>>
> Systems Librarian
> Interim Director for Integrated Library Systems
> University of Chicago Library
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message