jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-3348) Cross gc sessions might introduce references to pre-compacted segments
Date Mon, 14 Mar 2016 09:55:34 GMT

    [ https://issues.apache.org/jira/browse/OAK-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193009#comment-15193009

Michael Dürig commented on OAK-3348:

I just [pushed|https://github.com/mduerig/jackrabbit-oak/commit/f0d5028a126d18996f6e0503da1efdc45ef6fa11]
as a possible solution for the segment node state ids. Instead of using a counter, I reuse
the {{RecordId}} of the node state itself: once a segment node state is written for the first
time the {{NodeStateWriter#writeRecordContent}} uses the id returned by {{SegmentBufferWriter#prepare}}
and writes it as first id to the buffer. Otherwise it just writes the id passed in via {{nodeId}}
{{SegmentNodeState#getId}} can now easily distinguish the two cases and either return the
id of the segment node state itself or the id which the id is pointing to. 

These ids take up 18 bytes (2 longs for the uuid of the segment and a short for the offset)
plus another 3 bytes for the record id written to the node state itself.  The former 18 bytes
however only affect compacted nodes. They are not present for uncompacted nodes and we could
also leave them out in the off-line compaction case. 

> Cross gc sessions might introduce references to pre-compacted segments
> ----------------------------------------------------------------------
>                 Key: OAK-3348
>                 URL: https://issues.apache.org/jira/browse/OAK-3348
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segmentmk
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>              Labels: candidate_oak_1_0, candidate_oak_1_2, cleanup, compaction, gc
>             Fix For: 1.6
>         Attachments: OAK-3348-1.patch, OAK-3348-2.patch, OAK-3348.patch, SCIT.patch,
cleanup-time.png, compaction-time.png, cross-gc-refs.pdf, image.png, repo-size.png
> I suspect that certain write operations during compaction can cause references from compacted
segments to pre-compacted ones. This would effectively prevent the pre-compacted segments
from getting evicted in subsequent cleanup phases. 
> The scenario is as follows:
> * A session is opened and a lot of content is written to it such that the update limit
is exceeded. This causes the changes to be written to disk. 
> * Revision gc runs causing a new, compacted root node state to be written to disk.
> * The session saves its changes. This causes rebasing of its changes onto the current
root (the compacted one). At this point any node that has been added will be added again in
the sub-tree rooted at the current root. Such nodes however might have been written to disk
*before* revision gc ran and might thus be contained in pre-compacted segments. As I suspect
the node-add operation in the rebasing process *not* to create a deep copy of such nodes but
to rather create a *reference* to them, a reference to a pre-compacted segment is introduced
> Going forward we need to validate above hypothesis, assess its impact if necessary come
up with a solution.

This message was sent by Atlassian JIRA

View raw message