jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (OAK-1804) TarMK compaction
Date Thu, 12 Jun 2014 14:35:02 GMT

     [ https://issues.apache.org/jira/browse/OAK-1804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jukka Zitting reopened OAK-1804:
--------------------------------


There's two more problems:

* On a really large repository with hundreds of millions of nodes, the uncompressed compaction
map inside the Compactor class can become huge, up to a few gigabytes. It would be better
if we could use the far more memory-efficient CompactionMap data structure instead, and perhaps
further limit the number of entries we store in the map in the first place.
* The compaction checks in fastEquals() add up to some performance overhead since they get
executed for all sorts of record comparisons, not just for nodes and blobs. It would be better
to do the compaction checks only for those higher level comparisons.

I'll take a look at fixing the above issues.

> TarMK compaction
> ----------------
>
>                 Key: OAK-1804
>                 URL: https://issues.apache.org/jira/browse/OAK-1804
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segmentmk
>            Reporter: Jukka Zitting
>            Assignee: Alex Parvulescu
>              Labels: production, tools
>             Fix For: 1.0.1, 1.1
>
>         Attachments: SegmentNodeStore.java.patch, compact-on-flush.patch, compaction-map-as-bytebuffer.patch,
compaction.patch, fast-equals.patch
>
>
> The TarMK would benefit from periodic "compact" operations that would traverse and recreate
(parts of) the content tree in order to optimize the storage layout. More specifically, such
compaction would:
> * Optimize performance by increasing locality and reducing duplication, both of which
improve the effectiveness of caching.
> * Allow the garbage collector to release more unused disk space by removing references
to segments where only a subset of content is reachable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message