lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-5693) don't write deleted documents on flush
Date Wed, 21 May 2014 16:58:38 GMT


Michael McCandless commented on LUCENE-5693:

bq. This only makes sense for postings though.

Right, postings is much easier than doc values.  But postings are also the most costly to

bq. By writing them some places and not writing them other places, we open the possibility
of extremely confusing corner cases and bugs.

I disagree: I think we discover places that are "relying" on deleted docs behavior, i.e. test
bugs.  When I did this on LUCENE-5675 there were only a few places that relied on deleted

> don't write deleted documents on flush
> --------------------------------------
>                 Key: LUCENE-5693
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
> When we flush a new segment, sometimes some documents are "born deleted", e.g. if the
app did a IW.deleteDocuments that matched some not-yet-flushed documents.
> We already compute the liveDocs on flush, but then we continue (wastefully) to send those
known-deleted documents to all Codec parts.
> I started to implement this on LUCENE-5675 but it was too controversial.
> Also, I expect typically the number of deleted docs is 0, or small, so not writing "born
deleted" docs won't be much of a win for most apps.  Still it seems silly to write them, consuming
IO/CPU in the process, only to consume more IO/CPU later for merging to re-delete them.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message