jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4014) The segment store should merge small TAR files into bigger ones
Date Tue, 19 Jul 2016 08:30:21 GMT

    [ https://issues.apache.org/jira/browse/OAK-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383788#comment-15383788

Francesco Mari commented on OAK-4014:

[~mduerig] this issue is not describing a problem, but a general approach that can be applied
even in the context of the new garbage collection based on generations. One of the problems
that we constantly observe for large repositories is the exhaustion of file handles: one way
to mitigate this problem would be to merge into larger files the segments that survived a
certain amount of garbage collection cycles. I think that the approach described here can
be valuable.

> The segment store should merge small TAR files into bigger ones
> ---------------------------------------------------------------
>                 Key: OAK-4014
>                 URL: https://issues.apache.org/jira/browse/OAK-4014
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: 1.6, Segment Tar 0.0.6
> The cleanup process removes unused segments from TAR files and writes new generations
of those TAR files without the removed segments.
> In the long run, the size of some TAR file might be smaller than the maximum size allowed
for a TAR file. At the time this issue was created the default maximum size of a TAR file
is 256 MiB.
> If there are many small TAR files, it should be possible to merge them in bigger files.
This way, we can reduce the total number of TAR files in the segment store, and thus the number
of open file descriptors that Oak has to maintain.
> A possible implementation for the merge operation is the following:
> # Sort the list of TAR files by size, ascending.
> # Pick TAR files for the sorted list until the sum of their sizes after the merge is
less than 256 MiB.
> # Merge the picked up files into a new TAR file and marked the picked up files for deletion.
> # Continue picking up TAR files from the sorted list until the list is exhausted or until
it's only possible to pick a single TAR file.
> The merge process can run in a background thread but it is important that it doesn't
conflict with the cleanup operation, since merge and cleanup both change the representation
of TAR files on the file system. Two possible solutions to avoid conflicts are:
> # Use a global lock for the whole set of TAR files.
> # Use a lock per TAR file. The cleanup and merge processes have to agree on the order
to use when acquiring the lock.

This message was sent by Atlassian JIRA

View raw message