lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: delete a document from indexwriter
Date Tue, 22 Jan 2008 09:43:09 GMT
I am looking at the IndexWriter source code - and I could not find a method
(private) to delete by doc id.
Where is it hiding?

Best,
-C.B.

On Jan 19, 2008 1:07 PM, Michael McCandless <lucene@mikemccandless.com>
wrote:

>
> Good question....
>
> So far, this method has not been carried over to IndexWriter because
> in general it's not really safe, since there's no way to get an
> accurate docID from IndexWriter itself.
>
> You can't really "know" when IndexWriter does merges that compacts
> deletes and thus changes docIDs.  So, if you open a reader on the
> side, get a docID you want to delete, and then go and ask IndexWriter
> to delete that docID, you may in fact delete the wrong document.  In
> 2.3, where segment merges are now done with a background thread, it's
> even worse, because a merge could complete and be committed, thus
> changing docIDs, at any time...
>
> See complex discussion here:
>
>     http://markmail.org/message/wxqel3gd6cmavk5a
>
> As of 2.3, the low level infrastructure was added to IndexWriter for
> deleting by document ID, but this is not exposed publicly (this was a
> side effect of LUCENE-1112).  It's only used, internally, to delete a
> document if an exception is hit while indexing it.  In theory, you
> could then subclass IndexWriter and tap into this infrastructure to
> delete by docID, but, you're entering dangerous territory!
>
> Do you have a specific use case in mind here?  I think we'd like to
> make this option available someday in IndexWriter, but doing so now
> (when there is no way to get a "reliable" docID) seems too dangerous...
>
> Mike
>
> Cam Bazz wrote:
>
> > Hello,
> >
> > How do I delete a specific document from an indexwriter? I
> > understand there
> > is deleteDocuments(term) which deletes all the documents matching
> > the term.
> > But what if I want to delete a document that has more then one term in
> > specific. I can search the document with a boolean query, and then
> > get the
> > doc id.
> > I know that doc ids are temporary, but can I not use it for delete?
> >
> > IndexReader has a delete by doc id method, but I am not sure how to
> > use this
> > when using an indexwriter.
> >
> > Best,
> > C.B.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message