lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cam Bazz" <camb...@gmail.com>
Subject Re: delete a document from indexwriter
Date Mon, 21 Jan 2008 15:10:36 GMT
Yes, I noticed
http://www.archivum.info/java-dev@lucene.apache.org/2006-09/msg00065.html

Somehow I gotta do my delete within the same writer. I could use another
field that combines both src and dst field, and use this field without
storing but still a waste of resources.

I wonder if IndexWriter can be modified to accept a boolean query with termA
and termB - instead of single term?
I read the code until the part where deleted the term is added to a hashmap,
but could not follow on after that.

Best,
-C.B.



  synchronized private void addDeleteTerm(Term term, int docCount) {
    Num num = (Num) bufferedDeleteTerms.get(term);
    if (num == null) {
      bufferedDeleteTerms.put(term, new Num(docCount));
      // This is coarse approximation of actual bytes used:
      numBytesUsed += (term.field().length() + term.text().length()) *
BYTES_PER_CHAR
          + 4 + 5 * OBJECT_HEADER_BYTES + 5 * OBJECT_POINTER_BYTES;
      if (ramBufferSize != IndexWriter.DISABLE_AUTO_FLUSH
          && numBytesUsed > ramBufferSize) {
        bufferIsFull = true;
      }
    } else {
      num.setNum(docCount);
    }
    numBufferedDeleteTerms++;
  }




Best Regards,


On Jan 21, 2008 4:55 PM, Michael McCandless <lucene@mikemccandless.com>
wrote:

>
> You will have to close the IndexWriter.
>
> Only one "writer" may be open at once on an index, where "writer"
> includes an IndexReader that has done some deletes (the first time
> you delete a document using a reader, it will acquire the write.lock,
> which will fail if you have another writer open on that index).
>
> Mike
>
> Cam Bazz wrote:
>
> > Hello Michael;
> >
> > how can I construct a chain where both reader and writer at the
> > same state?
> > You can call getIndexReader method of the IndexSearcher. But when I
> > delete
> > documents through the reader, how will this interact with the writer?
> > I am have disabled autoflush and using my own logic to do flushes,
> > since I
> > have very small document sizes. However I am little confused about
> > how to
> > use
> > IndexWriter and IndexReader and IndexSearcher at the same logic.
> > Basically
> > currently I have a IndexWriter and IndexSearcher, I write
> > documents, and
> > flush() my own. In this scenario
> > how can I access a reader so that my logic still works.
> >
> > Best.
> > -C.B.
> >
> >
> >
> >
> >
> > On Jan 21, 2008 4:28 PM, Michael McCandless
> > <lucene@mikemccandless.com>
> > wrote:
> >
> >>
> >> For this case, too, you will need to use an IndexReader, or use
> >> IndexSearcher to run that particular search and then delete the
> >> docIDs returned using the IndexReader.
> >>
> >> Though, be sure to first iterate through all hits, gathering all
> >> docIDs.  And then in 2nd pass, do the deletions.  Otherwise you'll
> >> hit this issue:
> >>
> >>     https://issues.apache.org/jira/browse/LUCENE-1096
> >>
> >> (Unless you're using 2.3).
> >>
> >> You can also use Solr, which provides "delete by query".
> >>
> >> Mike
> >>
> >> Cam Bazz wrote:
> >>
> >>> Hello Mike;
> >>>
> >>> How about deleting by a compount term?
> >>>
> >>> for example if I have a document with two fields srcId and dstId
> >>> and I want to delete the document where srcId=1 and dstId=2
> >>>
> >>> right now there exists a IndexWriter.deleteDocuments(Term t) but
> >>> with that I
> >>> can only delete lets say where srcId=something.
> >>>
> >>> I am sure there is a workaround but I could not find it.
> >>>
> >>> Best,
> >>>
> >>> On Jan 19, 2008 1:07 PM, Michael McCandless
> >>> <lucene@mikemccandless.com>
> >>> wrote:
> >>>
> >>>>
> >>>> Good question....
> >>>>
> >>>> So far, this method has not been carried over to IndexWriter
> >>>> because
> >>>> in general it's not really safe, since there's no way to get an
> >>>> accurate docID from IndexWriter itself.
> >>>>
> >>>> You can't really "know" when IndexWriter does merges that compacts
> >>>> deletes and thus changes docIDs.  So, if you open a reader on the
> >>>> side, get a docID you want to delete, and then go and ask
> >>>> IndexWriter
> >>>> to delete that docID, you may in fact delete the wrong
> >>>> document.  In
> >>>> 2.3, where segment merges are now done with a background thread,
> >>>> it's
> >>>> even worse, because a merge could complete and be committed, thus
> >>>> changing docIDs, at any time...
> >>>>
> >>>> See complex discussion here:
> >>>>
> >>>>     http://markmail.org/message/wxqel3gd6cmavk5a
> >>>>
> >>>> As of 2.3, the low level infrastructure was added to IndexWriter
> >>>> for
> >>>> deleting by document ID, but this is not exposed publicly (this
> >>>> was a
> >>>> side effect of LUCENE-1112).  It's only used, internally, to
> >>>> delete a
> >>>> document if an exception is hit while indexing it.  In theory, you
> >>>> could then subclass IndexWriter and tap into this infrastructure to
> >>>> delete by docID, but, you're entering dangerous territory!
> >>>>
> >>>> Do you have a specific use case in mind here?  I think we'd like to
> >>>> make this option available someday in IndexWriter, but doing so now
> >>>> (when there is no way to get a "reliable" docID) seems too
> >>>> dangerous...
> >>>>
> >>>> Mike
> >>>>
> >>>> Cam Bazz wrote:
> >>>>
> >>>>> Hello,
> >>>>>
> >>>>> How do I delete a specific document from an indexwriter? I
> >>>>> understand there
> >>>>> is deleteDocuments(term) which deletes all the documents matching
> >>>>> the term.
> >>>>> But what if I want to delete a document that has more then one
> >>>>> term in
> >>>>> specific. I can search the document with a boolean query, and then
> >>>>> get the
> >>>>> doc id.
> >>>>> I know that doc ids are temporary, but can I not use it for
> >>>>> delete?
> >>>>>
> >>>>> IndexReader has a delete by doc id method, but I am not sure
> >>>>> how to
> >>>>> use this
> >>>>> when using an indexwriter.
> >>>>>
> >>>>> Best,
> >>>>> C.B.
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------
> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message