jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3349) Partial compaction
Date Thu, 30 Mar 2017 16:04:41 GMT

    [ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949312#comment-15949312
] 

Michael Dürig commented on OAK-3349:
------------------------------------

Initial POC for tail compaction at https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC.

The following graphs show the effect of this approach in {{SegmentCompactionIT}} through 20-something
hourly gc runs.

Post compaction size is comparable up until the 12th runs. At this point tail compaction probably
starts to accumulate too much "previous compacted state". A full compaction might be beneficial
here.
!post-gc-size.png!

Looking at compaction time tail compaction is actually much faster! Only with the 20th run
does it seem to start to degenerate. (Later on it seems to recover... I have the test still
running and will post further graphs once I have the data).
!compaction-time.png!

The graph of the number of cycles per gc run confirms above picture: tail compaction only
went into a force compact (6-th cycle) once before the 20th run, where the base line did much
earlier.
!cycle-count.png!

> Partial compaction
> ------------------
>
>                 Key: OAK-3349
>                 URL: https://issues.apache.org/jira/browse/OAK-3349
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc, scalability
>             Fix For: 1.8, 1.7.3
>
>         Attachments: compaction-time.png, cycle-count.png, post-gc-size.png
>
>
> On big repositories compaction can take quite a while to run as it needs to create a
full deep copy of the current root node state. For such cases it could be beneficial if we
could partially compact the repository thus splitting full compaction over multiple cycles.

> Partial compaction would run compaction on a sub-tree just like we now run it on the
full tree. Afterwards it would create a new root node state by referencing the previous root
node state replacing said sub-tree with the compacted one. 
> Todo: Asses feasibility and impact, implement prototype.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message