jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3349) Partial compaction
Date Thu, 30 Mar 2017 16:04:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949312#comment-15949312

Michael Dürig commented on OAK-3349:

Initial POC for tail compaction at https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC.

The following graphs show the effect of this approach in {{SegmentCompactionIT}} through 20-something
hourly gc runs.

Post compaction size is comparable up until the 12th runs. At this point tail compaction probably
starts to accumulate too much "previous compacted state". A full compaction might be beneficial

Looking at compaction time tail compaction is actually much faster! Only with the 20th run
does it seem to start to degenerate. (Later on it seems to recover... I have the test still
running and will post further graphs once I have the data).

The graph of the number of cycles per gc run confirms above picture: tail compaction only
went into a force compact (6-th cycle) once before the 20th run, where the base line did much

> Partial compaction
> ------------------
>                 Key: OAK-3349
>                 URL: https://issues.apache.org/jira/browse/OAK-3349
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc, scalability
>             Fix For: 1.8, 1.7.3
>         Attachments: compaction-time.png, cycle-count.png, post-gc-size.png
> On big repositories compaction can take quite a while to run as it needs to create a
full deep copy of the current root node state. For such cases it could be beneficial if we
could partially compact the repository thus splitting full compaction over multiple cycles.

> Partial compaction would run compaction on a sub-tree just like we now run it on the
full tree. Afterwards it would create a new root node state by referencing the previous root
node state replacing said sub-tree with the compacted one. 
> Todo: Asses feasibility and impact, implement prototype.

This message was sent by Atlassian JIRA

View raw message