lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Get all document ids from a search.
Date Tue, 15 Dec 2009 14:12:39 GMT

On Dec 14, 2009, at 6:00 PM, Niclas Rothman wrote:

> Hi there,
> Perhaps this is far out but I need to get some advice on the following problem.
> We use Lucene what it is really good to, to find documents by "relevance".
> After a search have been done and I have the hits in my hands, I need to do some heavy
sorting on this list where the data about sorting is stored in the database, not in the lucene
> Therefore I need to get all document ids for a search so I can fetch the needed data
from the database and afterwards apply my custom sorting.
> How can I get from a search all document ids?
> Can this be done with ok performance?
> I have been wondering if could do the sorting in lucene but I don't feel comfortable
at all because of lacking information / documentation.
> Also, the sorting should preferable be don Just in time, that is, the underlying data
for sorting changes constantly and I cant reindex as soon as sorting data changes.
> Any idea / suggestions?

I would look at implementing a custom comparator for the Sort instance in Lucene.  This requires
implementing a FieldComparatorSource and a FieldComparator.  There are lots of examples in
the Lucene code of this.  Note, the name FieldComparatorSource is a bit of a misnomer, as
it doesn't have to be a Field (for instance, on SOLR-1297, I just implemented it to allow
for sorts by Function Queries).  Naturally, getting this to perform with a database is going
to be pretty tricky, but I think it will be way better than having to process all of the results
a second time.  Having an effective caching strategy (similar to Lucene's FieldCache) will
be important.

The other thing you could think about doing is loading a FieldCache with the ids (do it once
when you load the IndexReader) and then use that  with a bit set telling you what documents

In either case, you are making a tradeoff with memory.


Grant Ingersoll

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:

View raw message