lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: How can I search over all documents NOT in a certain subset?
Date Tue, 05 Jun 2007 20:49:34 GMT
Hi Hilton,

Hilton Campbell wrote:
> Hello all,
> In my application I want to perform a search over all the documents
> that are NOT in a certain subset, and I'm not sure how I should do
> it.
> Specifically, the application performs a search and the top N results
> are shown to the user. The user may then opt to see the next top N 
> results. By the time the user chooses to see the next N results,
> however, there may be new, highly-relevant documents in the index (as
> indexing is occurring concurrently). So instead of just skipping to
> the next N, I need to perform a new search and get the top N that
> haven't been seen yet. Is anyone aware of an efficient way to
> implement this?
> I can think of at least one way: I can keep track of the documents 
> that have been seen and iterate through all the hits, skipping those 
> that have already been seen. I just want to see if there isn't a 
> better way that doesn't iterate through potentially hundreds of 
> already seen hits, or if anyone has any pointers on an efficient
> implementation of this idea.

Conceptually (caveat: untested), you could:

1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per
IndexReader.  The BitSet would hold one bit per doc[2], each initialized
to true.

2. Unset a DejaVuFilter instance's bit for each of your top N docs by
walking the TopDocs returned by,Filter,int)[3].
Initially, you could pass in null for the Filter, and then for all
following calls, an instance of DejaVuFilter.

3. Repeat step #2 as many times as necessary.



Steve Rowe
Center for Natural Language Processing

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message