lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-808) bufferDeleteTerm in IndexWriter might flush prematurely
Date Wed, 21 Feb 2007 19:56:06 GMT


Doron Cohen commented on LUCENE-808:

Ning Li wrote:

> The code correctly reflects its designed semantics:
> numBufferedDeleteTerms is a simple sum of terms passed to
> updateDocument or deleteDocuments.
> If the first of two successive calls to the same term should be
> considered no op if no docs were added in between, shouldn't the first
> also be considered no op if the docs added in between do not contain
> the term? Whether a doc contains a term, however, can only be
> determined at the time of actual deletion for performance reasons.
> Thus I think the original semantics is cleaner.

I agree, the code is correct for a 'simple sum' semantics. 

Looking at the javadocs for setMaxBufferedDeleteTerms(), it says: 
"minimal number of delete terms". To me, this reads like: "minimal 
number of (actual) delete terms".

But beyond one definition or another, I guess the question should be
what would application developers expect. For an operation that is 
clearly a no-op, wouldn't they expect no side effects?

As an example, if an application calls IndexWriter.flush() twice 
in a row, second call is a no-op and would have no side effects.

Similarly, when editing a document or file, clicking "save" will 
do nothing in case there are no changes (otherwise users would be
quite surprised).

Imagine the application and Lucene could talk, with the current 
implementation we could hear something like this:

  [applic] <calling del-by-term>;
  [lucene] <increment buf-del-terms-counter>;
  [applic] <searching>; "why on earth weren't these docs deleted?"
  [applic] <calling del-by-term again for same term>;
  [lucene] <incrementing buf-del-terms-counter again; merging>;
  [applic] <searching>; "that's better! mmm... I wonder why the 
           first delete of this term didn't do it... Was there
           any difference between these calls?"

> bufferDeleteTerm in IndexWriter might flush prematurely
> -------------------------------------------------------
>                 Key: LUCENE-808
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.1
>            Reporter: Doron Cohen
>         Assigned To: Doron Cohen
>         Attachments: successive_bufferDeleteTerm.patch
> Successive calls to remove-by-the-same-term would increment numBufferedDeleteTerms
> although all but the first are no op if no docs were added in between. Hence deletes
> be flushed too soon.
> It is a minor problem, should be rare, but it seems cleaner to fix this. 
> Attached patch also fixes TestIndexWriterDelete.testNonRAMDelete() which somehow
> relied on this behavior.  All tests pass.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message