jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3348) Cross gc sessions might introduce references to pre-compacted segments
Date Wed, 06 Apr 2016 09:02:25 GMT

    [ https://issues.apache.org/jira/browse/OAK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15227978#comment-15227978
] 

Michael Dürig commented on OAK-3348:
------------------------------------

I [pushed | https://github.com/mduerig/jackrabbit-oak/commits/OAK-3348.poc] a couple of further
commits. Most notably

* [RIP Compactor | https://github.com/mduerig/jackrabbit-oak/commit/399964ca01b61a530ecc4c73b7d041bddc842edb]:
compaction is now down by "just" calling {{SegmentWriter.writeNode}} using the next generation.
This will cause the node to be deeply copied. If concurrent commits result in further compaction
cycles {{SegmentWriter.writeNode}} is just called again with the new head state. Rebasing
onto what have been compacted already in the previous cycle is implicit through the de-duplication
cache. The reason that this changes has become necessary is that previously compaction cycles
would lead to missing ids for node states because those are assigned by the segment writer
but not by the rebasing process that have taken place in the compactor. 

* [GC generation in the tar index | https://github.com/mduerig/jackrabbit-oak/commit/3be3df740288e11789be7668a718150eb24334cb]:
The tar file index now includes the GC generation for each segment. This greatly speeds up
cleanup as without this each segments needs to be actually accessed creating a lot of extra
IO. 

The other commits are small fixes for side line findings which where useful to include here.
See the respective issue numbers. 

> Cross gc sessions might introduce references to pre-compacted segments
> ----------------------------------------------------------------------
>
>                 Key: OAK-3348
>                 URL: https://issues.apache.org/jira/browse/OAK-3348
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segmentmk
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: candidate_oak_1_0, candidate_oak_1_2, cleanup, compaction, gc
>             Fix For: 1.6
>
>         Attachments: OAK-3348-1.patch, OAK-3348-2.patch, OAK-3348.patch, SCIT.patch,
cleanup-time.png, compaction-time.png, cross-gc-refs.pdf, image.png, repo-size.png
>
>
> I suspect that certain write operations during compaction can cause references from compacted
segments to pre-compacted ones. This would effectively prevent the pre-compacted segments
from getting evicted in subsequent cleanup phases. 
> The scenario is as follows:
> * A session is opened and a lot of content is written to it such that the update limit
is exceeded. This causes the changes to be written to disk. 
> * Revision gc runs causing a new, compacted root node state to be written to disk.
> * The session saves its changes. This causes rebasing of its changes onto the current
root (the compacted one). At this point any node that has been added will be added again in
the sub-tree rooted at the current root. Such nodes however might have been written to disk
*before* revision gc ran and might thus be contained in pre-compacted segments. As I suspect
the node-add operation in the rebasing process *not* to create a deep copy of such nodes but
to rather create a *reference* to them, a reference to a pre-compacted segment is introduced
here. 
> Going forward we need to validate above hypothesis, assess its impact if necessary come
up with a solution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message