jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3349) Partial compaction
Date Wed, 05 Jul 2017 08:11:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074397#comment-16074397

Michael Dürig commented on OAK-3349:

I [rebased|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-rebased] my earlier
branch [OAK-3349-POC|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349-POC] branch
on top of the current Oak trunk. I used an overnight run of this [SCIT|https://github.com/mduerig/jackrabbit-oak/commit/0f4db5c3a5052b278a28ee92cf596624521340b7]
configuration to confirm cleanup is as efficient as before rebasing. 

On the [OAK-3349|https://github.com/mduerig/jackrabbit-oak/commits/OAK-3349] branch I started
integrating tail compaction and full compaction. In a first step I [replaced|https://github.com/mduerig/jackrabbit-oak/commit/0c11ff13d59f9660ca76ae82b2ffe9ab204b3aba]
the integer representing the gc generation with an ADT {{GCGeneration}} capturing the same
semantics (passes ITs).
In a next step I started [integrating|https://github.com/mduerig/jackrabbit-oak/commit/a75604e6a32f2408b9f74a6e229ebaeee1f8de65]
tail compaction and full compaction. There are many loose ends still: 
* The tail generation is not going into the tar index which makes cleanup awkward and inefficient
and precise blob reference collection impossible. 
* Deduplication cache generations are not properly selected as they only take the full generation
in account. 
* There is no way to schedule/trigger full vs. tail compaction. 
* Graceful and correct degradation to full compaction in the case where a base version cannot
be determined in not implemented. 
* The {{gc.log}} does not reflect tail compactions.
* No UT and IT coverage for tail compaction.

All respective locations in the code are marked with {{FIXME OAK-3349}}.

> Partial compaction
> ------------------
>                 Key: OAK-3349
>                 URL: https://issues.apache.org/jira/browse/OAK-3349
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: compaction, gc, scalability
>             Fix For: 1.8, 1.7.4
>         Attachments: compaction-time.png, cycle-count.png, post-gc-size.png
> On big repositories compaction can take quite a while to run as it needs to create a
full deep copy of the current root node state. For such cases it could be beneficial if we
could partially compact the repository thus splitting full compaction over multiple cycles.

> Partial compaction would run compaction on a sub-tree just like we now run it on the
full tree. Afterwards it would create a new root node state by referencing the previous root
node state replacing said sub-tree with the compacted one. 
> Todo: Asses feasibility and impact, implement prototype.

This message was sent by Atlassian JIRA

View raw message