jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3070) Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
Date Wed, 15 Feb 2017 19:58:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868474#comment-15868474

Vikas Saurabh commented on OAK-3070:

Ah!, ok, now I see what you mean. So, probably the correctness can be proven by the fact that
scanning _modified would have same behavior as scanning _deletedOnce (as any state change
that could be setting/resetting on deleted once would also be setting up _modified). If we're
handling _modified correctly then resetting _deletedOnce would work.
Ok, yes, I think that can be an approach.

But, now we'd be introducing another write - while _modified is already indexed and querying
with lower bound should be equally ok, right?

bq. We've seen collection times of >4 hours on a test system, so I'm not sure about that

The attached patch is around 1.5 years old - so, I don't know if it works anymore or not.
But, maybe, it'd be good idea to test this out again on some test setup which consistently
give larger time during candidate-collecting phase.

> Use a lower bound in VersionGC query to avoid checking unmodified once deleted docs
> -----------------------------------------------------------------------------------
>                 Key: OAK-3070
>                 URL: https://issues.apache.org/jira/browse/OAK-3070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>              Labels: performance
>         Attachments: OAK-3070.patch
> As part of OAK-3062 [~mreutegg] suggested
> {quote}
> As a further optimization we could also limit the lower bound of the _modified
> range. The revision GC does not need to check documents with a _deletedOnce
> again if they were not modified after the last successful GC run. If they
> didn't change and were considered existing during the last run, then they
> must still exist in the current GC run. To make this work, we'd need to
> track the last successful revision GC run. 
> {quote}
> Lowest last validated _modified can be possibly saved in settings collection and reused
for next run

This message was sent by Atlassian JIRA

View raw message