lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: delete a document from indexwriter
Date Tue, 22 Jan 2008 10:07:09 GMT

Exactly, that is the method...

Mike

Cam Bazz wrote:

> Hello,
>
> Did you mean the
>
>  synchronized private void addDeleteDocID(int docId) {
>     bufferedDeleteDocIDs.add(new Integer(docId));
>     numBytesUsed += OBJECT_HEADER_BYTES + BYTES_PER_INT +  
> OBJECT_POINTER_BYTES;
>  }
>
>
> this however does not delete document but rather marks it in  
> bufferedDeleteDocIDs right?
>
> On Jan 22, 2008 11:51 AM, Michael McCandless <  
> lucene@mikemccandless.com> wrote:
>
> Woops, sorry, it's hiding in DocumentsWriter: the addDeleteDocID
> method.
>
> And, it's private.  So you will actually have to modify the Lucene
> core sources (not just subclass IndexWriter) to experiment with
> deleting by docID in IndexWriter.  If you find a clean way to safely
> expose this functionality, please post back with your results!
>
> Mike
>
> Cam Bazz wrote:
>
>> I am looking at the IndexWriter source code - and I could not find  
>> a method (private) to delete by doc id.
>> Where is it hiding?
>>
>> Best,
>> - C.B.
>>
>> On Jan 19, 2008 1:07 PM, Michael McCandless <  
>> lucene@mikemccandless.com> wrote:
>>
>> Good question....
>>
>> So far, this method has not been carried over to IndexWriter because
>> in general it's not really safe, since there's no way to get an
>> accurate docID from IndexWriter itself.
>>
>> You can't really "know" when IndexWriter does merges that compacts
>> deletes and thus changes docIDs.  So, if you open a reader on the
>> side, get a docID you want to delete, and then go and ask IndexWriter
>> to delete that docID, you may in fact delete the wrong document.  In
>> 2.3 , where segment merges are now done with a background thread,  
>> it's
>> even worse, because a merge could complete and be committed, thus
>> changing docIDs, at any time...
>>
>> See complex discussion here:
>>
>>     http://markmail.org/message/wxqel3gd6cmavk5a
>>
>> As of 2.3, the low level infrastructure was added to IndexWriter for
>> deleting by document ID, but this is not exposed publicly (this was a
>> side effect of LUCENE-1112).  It's only used, internally, to delete a
>> document if an exception is hit while indexing it.  In theory, you
>> could then subclass IndexWriter and tap into this infrastructure to
>> delete by docID, but, you're entering dangerous territory!
>>
>> Do you have a specific use case in mind here?  I think we'd like to
>> make this option available someday in IndexWriter, but doing so now
>> (when there is no way to get a "reliable" docID) seems too  
>> dangerous...
>>
>> Mike
>>
>> Cam Bazz wrote:
>>
>> > Hello,
>> >
>> > How do I delete a specific document from an indexwriter? I
>> > understand there
>> > is deleteDocuments(term) which deletes all the documents matching
>> > the term.
>> > But what if I want to delete a document that has more then one  
>> term in
>> > specific. I can search the document with a boolean query, and then
>> > get the
>> > doc id.
>> > I know that doc ids are temporary, but can I not use it for delete?
>> >
>> > IndexReader has a delete by doc id method, but I am not sure how to
>> > use this
>> > when using an indexwriter.
>> >
>> > Best,
>> > C.B.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message