lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Re: Migrating from Hit/Hits to TopDocs/TopDocCollector
Date Wed, 10 Jun 2009 23:58:12 GMT
On Wed, Jun 10, 2009 at 20:17, Uwe Schindler<> wrote:
> You are right, you can, but if you just want to retrieve all hits, this is
> ineffective. A HitCollector is the correct way to do this (especially
> because the order of hits is mostly not interesting when retrieving all
> hits). Hits and TopDocs are intended for paged results lists.

As a relevant note, what I have noticed about using HitCollector alone
is that the code effectively loses control of the loop (you get the
same problem with any API where you hand it a callback and let it do
all the work, e.g. SAX.)  The callback is good if you have a
relatively small number of results and/or a relatively fast operation
to perform with each one, but if the process as a whole takes a long
time and the user wants to be able to cancel it, then it isn't great.
It also isn't great if you want to wrap an Iterator or some other
existing API around it.

Our workaround for this is a HitCollector which populates a BitSet
(relatively fast), and then do the slow operation when iterating over
the BitSet.  This also has drawbacks in terms of memory usage, but
that doesn't become a huge problem until you have a very large number
of documents in the index.

It's a shame we don't have an inverted kind of HitCollector where we
can say "give me the next hit", so that we can get the best of both
worlds (like what StAX gives us in the XML world.)


Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis                                and eDiscovery software

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message