lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Rowe <>
Subject Re: How can I search over all documents NOT in a certain subset?
Date Fri, 08 Jun 2007 12:46:55 GMT
Hi Hilton,

Hilton Campbell wrote:
> Yes, that's actually come up.  The document ids are indeed changing which is
> causing problems.  I'm still trying to work it out myself, but any help
> would most definitely be appreciated.
> Thanks,
> Hilton Campbell
> -----Original Message-----
> From: Antony Bowesman [] 
> Sent: Wednesday, June 06, 2007 11:36 PM
> To:
> Subject: Re: How can I search over all documents NOT in a certain subset?
> Steven Rowe wrote:
>> Conceptually (caveat: untested), you could:
>> 1. Extend Filter[1] (call it DejaVuFilter) to hold a BitSet per
>> IndexReader.  The BitSet would hold one bit per doc[2], each initialized
>> to true.
>> 2. Unset a DejaVuFilter instance's bit for each of your top N docs by
>> walking the TopDocs returned by,Filter,int)[3].
>> Initially, you could pass in null for the Filter, and then for all
>> following calls, an instance of DejaVuFilter.
> Just a thought...
> If Hilton wants to be aware of new Documents in the index since the previous
> search, this requires opening a new IndexReader.
> If only Documents have been added to the index I expect, but am not 
> sure, that the bits from the old IndexReader are still valid for the 
> document numbers in the new Reader. However, if there have been 
> deletions or optimisation has occurred between reader instances, then
> the document ids from the old reader may not represent the same
> documents in the new reader, so the Filter for the old reader will
> not be valid for the new search against the new reader and you may
> get false matches.
> I don't think there will be a problem if there are no deletions.

My bad for not pointing out this shortcoming.

Karl Wettin's patch may be useful to you:



Steve Rowe
Center for Natural Language Processing

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message