lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2897) apply delete-by-Term and docID immediately to newly flushed segments
Date Sat, 29 Jan 2011 16:21:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988472#action_12988472
] 

Michael McCandless commented on LUCENE-2897:
--------------------------------------------

bq. I had to read this a few times, yes it's very elegant as we're skipping the postings that
otherwise would be deleted immediately after flush, and we're reusing the terms map already
in DWPT.

Well... I think we can't [easily] skip writing the postings, because could result in non-deterministic
behavior (I put a comment on this in the patch).

If we did the flush w/ 2 passes (first pass to mark all del docs and 2nd to flush) then we
could skip writing postings of docs that were deleted.  But I suspect that's too much cost
on flush.

With a single pass, we'd end up writing some postings for the doc, but not all, depending
on the order in which its terms arrived vs its deleted terms.

I mean, in practice, an app is gonna delete against ID field (typically) so if we "knew" that
down deep here in Luceneland we could do the first pass only against that one field...

Also, merge is still going to have to apply del docs, since eg stored fields have written
the deleted docs.

> apply delete-by-Term and docID immediately to newly flushed segments
> --------------------------------------------------------------------
>
>                 Key: LUCENE-2897
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2897
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-2897.patch
>
>
> Spinoff from LUCENE-2324.
> When we flush deletes today, we keep them as buffered Term/Query/docIDs that need to
be deleted.  But, for a newly flushed segment (ie fresh out of the DWPT), this is silly, because
during flush we visit all terms and we know their docIDs.  So it's more efficient to apply
the deletes (for this one segment) at that time.
> We still must buffer deletes for all prior segments, but these deletes don't need to
map to a docIDUpto anymore; ie we just need a Set.
> This issue should wait until LUCENE-1076 is in since that issue cuts over buffered deletes
to a transactional stream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message