jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-3349) Partial compaction
Date Mon, 27 Mar 2017 12:52:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824201#comment-15824201
] 

Michael Dürig edited comment on OAK-3349 at 3/27/17 12:52 PM:
--------------------------------------------------------------

An alternative approach (let's call it tail compaction) would be to rebase *some* revisions
on a previous, already *compacted revision* instead of rebasing *all* revisions on the *empty
revision* (as we currently do): Let {{r0}} be an initial, compact revisions and {{r1}}, {{r2}},
{{r3}} be subsequent revisions in that order created by normal repository operation. The current
compaction approach rewrites {{r3}}, which also rewrites everything in {{r0}} (which is already
compact). Instead we could rebase the difference from {{r3}} to {{r0}} on {{r0}}, effectively
only rewriting the changes that came in with {{r1}},{{r2}} and {{r3}}. The latter approach
should be much lighter as it only rewrites recent changes, leaving everything alone that was
compacted previously already. 


was (Author: mduerig):
An alternative approach would be to rebase *some* revisions on a previous, already *compacted
revision* instead of rebasing *all* revisions on the *empty revision* (as we currently do):
Let {{r0}} be an initial, compact revisions and {{r1}}, {{r2}}, {{r3}} be subsequent revisions
in that order created by normal repository operation. The current compaction approach rewrites
{{r3}}, which also rewrites everything in {{r0}} (which is already compact). Instead we could
rebase the difference from {{r3}} to {{r0}} on {{r0}}, effectively only rewriting the changes
that came in with {{r1}},{{r2}} and {{r3}}. The latter approach should be much lighter as
it only rewrites recent changes, leaving everything alone that was compacted previously already.


> Partial compaction
> ------------------
>
>                 Key: OAK-3349
>                 URL: https://issues.apache.org/jira/browse/OAK-3349
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc, scalability
>             Fix For: 1.8, 1.7.3
>
>
> On big repositories compaction can take quite a while to run as it needs to create a
full deep copy of the current root node state. For such cases it could be beneficial if we
could partially compact the repository thus splitting full compaction over multiple cycles.

> Partial compaction would run compaction on a sub-tree just like we now run it on the
full tree. Afterwards it would create a new root node state by referencing the previous root
node state replacing said sub-tree with the compacted one. 
> Todo: Asses feasibility and impact, implement prototype.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message