jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcel Reutegger (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-4780) VersionGarbageCollector should be able to run incrementally
Date Thu, 16 Mar 2017 15:46:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-4780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928304#comment-15928304
] 

Marcel Reutegger edited comment on OAK-4780 at 3/16/17 3:46 PM:
----------------------------------------------------------------

This looks very promising. I'd like to include those changes step by step. That is, first
the VersionGC part in oak-core and in a second step the new run mode for oak-run. I would
even prefer if the second part goes into a separate issue.

Regarding your github branch. It contains a 'patches' directory with two diffs. What are those
changes?

Some more comments:

- VersionGarbageCollector.reset() can be simplified with just the remove() call. It will be
a noop if the document doesn't exist. 
- Can you please add tests for TimeInterval?
- Did you consider moving the new set methods on VersionGarbageCollector to a new class (e.g.
VersionGCOptions) and pass it as an argument to gc()? I think with the current patch it is
possible to influence a running GC by calling one of those set methods.
- What is the TODO about in VersionGCStats.addRun()?
- Usage of LimitExceededException from javax.naming is a bit funky ;) but I guess you didn't
want to invent yet another exception class
- VersionGarbageCollector.delayOnModification() should use Clock.waitUntil(). This allows
to write efficient tests with a virtual clock.
- Only minor: the diff for VersionGarbageCollector also contains a couple of indentation changes
for anonymous inner classes, which are unrelated to this improvement.
- In MongoVersionGCSupport.getDeletedOnceCount(): {{ReadPreference.nearest().secondaryPreferred()}}.
You cannot have both nearest and secondaryPreferred. The class will always give you a secondaryPreferred
ReadPreference.
- Minor: some unused imports in VersionGCSupport


was (Author: mreutegg):
This looks very promising. I'd like to include those changes step by step. That is, first
the VersionGC part in oak-core and in a second step the new run mode for oak-run. I would
even prefer if the second part goes into a separate issue.

Regarding your github branch. It contains a 'patches' directory with two diffs. What are those
changes?

Some more comments:

- VersionGarbageCollector.reset() can be simplified with just the remove() call. It will be
a noop if the document doesn't exist. 
- Can you please add tests for TimeInterval?
- Did you consider moving the new set methods on VersionGarbageCollector to a new class (e.g.
VersionGCOptions) and pass it as an argument to gc()? I think with the current patch it is
possible to influence a running GC by calling one of those set methods.
- What is the TODO about in VersionGCStats.addRun()?
- Usage of LimitExceededException from javax.naming is a big funky ;) but I guess you didn't
want to invent yet another exception class
- VersionGarbageCollector.delayOnModification() should use Clock.waitUntil(). This allows
to write efficient tests with a virtual clock.
- Only minor: the diff for VersionGarbageCollector also contains a couple of indentation changes
for anonymous inner classes, which are unrelated to this improvement.
- In MongoVersionGCSupport.getDeletedOnceCount(): {{ReadPreference.nearest().secondaryPreferred()}}.
You cannot have both nearest and secondaryPreferred. The class will always give you a secondaryPreferred
ReadPreference.
- Minor: some unused imports in VersionGCSupport

> VersionGarbageCollector should be able to run incrementally
> -----------------------------------------------------------
>
>                 Key: OAK-4780
>                 URL: https://issues.apache.org/jira/browse/OAK-4780
>             Project: Jackrabbit Oak
>          Issue Type: Task
>          Components: core, documentmk
>            Reporter: Julian Reschke
>         Attachments: leafnodes.diff, leafnodes-v2.diff, leafnodes-v3.diff
>
>
> Right now, the documentmk's version garbage collection runs in several phases.
> It first collects the paths of candidate nodes, and only once this has been successfully
finished, starts actually deleting nodes.
> This can be a problem when the regularly scheduled garbage collection is interrupted
during the path collection phase, maybe due to other maintenance tasks. On the next run, the
number of paths to be collected will be even bigger, thus making it even more likely to fail.
> We should think about a change in the logic that would allow the GC to run in chunks;
maybe by partitioning the path space by top level directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message