jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Parvulescu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-1804) TarMK compaction
Date Tue, 03 Jun 2014 17:58:02 GMT

     [ https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alex Parvulescu updated OAK-1804:

    Attachment: fast-equals.patch

there's another issues that affects the fast equals comparison of the Segment Record objects
(most notably SegmentNodeStates). After running a compaction the node states get new record
ids, this triggers a full traversal on all existing functionality that relies on diffing states
(observation, background indexing) which results in the cpu spiking (600% on my local machine)
for a small period of time.

After discussing this with Jukka, I'm attaching a proposed patch. Fair warning, the code is
not too pretty, but it appears to validate the main idea: the compactor should pass along
the compaction map (a link between old records and new records) which can be leveraged for
maintaining a fast equals. This makes the traversal (and cpu) issue go away, but it might
still need some cleanup. [~jukkaz] I'd appreciate some input :)

> TarMK compaction
> ----------------
>                 Key: OAK-1804
>                 URL: https://issues.apache.org/jira/browse/OAK-1804
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segmentmk
>            Reporter: Jukka Zitting
>            Assignee: Alex Parvulescu
>              Labels: production, tools
>             Fix For: 1.0.1, 1.1
>         Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, compaction.patch,
> The TarMK would benefit from periodic "compact" operations that would traverse and recreate
(parts of) the content tree in order to optimize the storage layout. More specifically, such
compaction would:
> * Optimize performance by increasing locality and reducing duplication, both of which
improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing references
to segments where only a subset of content is reachable.

This message was sent by Atlassian JIRA

View raw message