jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4780) VersionGarbageCollector should be able to run incrementally
Date Tue, 07 Mar 2017 13:29:38 GMT

    [ https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15899438#comment-15899438

Marcel Reutegger commented on OAK-4780:

bq. Shall it repeat itself when it has not caught up to "now"

I'd say, yes. If needed, the GC can be canceled already.

bq. What is the best value for "precisionMs", the minimal time interval for queries?

I don't think a one minute resolution is needed. Maybe it's easier we define how many iterations
are done to find the 'oldest' _deletedOnce? But then a time is more specific than a rather
abstract number of iterations.

Other comments on your patch:

- We should first resolve OAK-3070 and remove that part from your patch. 
- VersionGCSupport.getOldestDeletedOnceTimestamp(long) uses System.currentTimeMillis(). Might
be useful to use the Clock abstraction instead, which allows usage of a virtual clock for
- Similar for VersionGarbageCollector.gc(long, TimeUnit): Revision.getCurrentTimestamp() does
give you the current time of a Clock, but I think it would be better to use the clock from
the DocumentNodeStore passed in the constructor.
- {{maxIterations}} and {{maxDuration}}: are those really necessary? I think it would be easier
to use if those are implementation details and all you need to do is trigger gc() with a maxRevisionAge.
The GC would stop iterations when it reaches currentTime - maxRevisionAge or when it is canceled.
- {{batchDelay}}, I like the feature, but would prefer a more adaptive approach. That is,
have a value that defines the delay multiplier which is applied to the time it took for some
operation. Let's say it took 500 ms to remove a batch of documents and the delay multiplier
is 0.5, then the VGC would wait 250 ms until it proceeds to the next bach.

> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core, documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been successfully
finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is interrupted
during the path collection phase, maybe due to other maintenance tasks. On the next run, the
number of paths to be collected will be even bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in chunks;
maybe by partitioning the path space by top level directory.

This message was sent by Atlassian JIRA

View raw message