jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
Date Tue, 07 Mar 2017 15:37:37 GMT

    [ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899605#comment-15899605
] 

Marcel Reutegger edited comment on OAK-3070 at 3/7/17 3:37 PM:
---------------------------------------------------------------

I think the margin was introduced because of how {{VersionGCSupport.getPossiblyDeletedDocs()}}
compares the two timestamps. With the patch, the garbage collector may miss some documents.
Consider the following GC runs with the patch:

Initially {{getPossiblyDeletedDocs()}} will return
{noformat}
getModifiedInSecs(0) > getModifiedInSecs(doc._modified) <= getModifiedInSecs(t1)
{noformat}
In the subsequent run it will return 
{noformat}
getModifiedInSecs(t1) > getModifiedInSecs(doc._modified) <= getModifiedInSecs(t2)
{noformat}
There may be documents modified after t1 that still fall into the same 5 second resolution
bucket as t1. The second run will not match them.

I'll update the issue with a new patch...


was (Author: mreutegg):
I think the margin was introduced because of how {{VersionGCSupport.getPossiblyDeletedDocs()}}
compares the two timestamps. With the patch, the garbage collector may miss some documents.
Consider the following GC runs with the patch:

Initially {{getPossiblyDeletedDocs()}} will return {{0 > getModifiedInSecs(doc) <= t1}}.
In the subsequent run it will return {{t1 > getModifiedInSecs(doc) <= t2}}. There may
be documents modified after t1 that still fall into the same 5 second resolution bucket as
t1. The second run will not match them.

I'll update the issue with a new patch...

> Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
> -----------------------------------------------------------------------------------
>
>                 Key: OAK-3070
>                 URL: https://issues.apache.org/jira/browse/OAK-3070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>              Labels: performance
>         Attachments: OAK-3070.patch, OAK-3070-updated.patch, OAK-3070-updated.patch
>
>
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run. 
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection and reused
for next run



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message