jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Updated] (OAK-4756) A parallel approach to garbage collection
Date Thu, 22 Dec 2016 14:37:58 GMT

     [ https://issues.apache.org/jira/browse/OAK-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Michael Dürig updated OAK-4756:
    Labels: gc scalability  (was: gc)

> A parallel approach to garbage collection
> -----------------------------------------
>                 Key: OAK-4756
>                 URL: https://issues.apache.org/jira/browse/OAK-4756
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>              Labels: gc, scalability
>             Fix For: 1.8
> Assuming that:
> # Logic record IDs are implemented.
> # TAR files are ordered in reverse chronological order.
> # When reading segments, TAR files are consulted in order.
> # Segments in recent TAR files shadow segments in older TAR files with the same segment
> A new algorithm for garbage collection can be implemented:
> # Define the input for the garbage collection process. The input consists of the current
set of TAR files and a set of record IDs representing the GC roots.
> # Traverse the GC roots and mark the records that are still in use. The mark phase traverses
the record graph and produces a list of record IDs. These record IDs are referenced directly
or indirectly by the given set of GC roots and need to be kept. The list of record IDs is
ordered by segment ID first and record number next. This way, it is possible to process this
list in one pass and figure out which segment and which record should be saved at the end
of the garbage collection.
> # Remove unused records from segments and rewrite them in a new set of TAR files. The
list is produced in the previous step is traversed. For each segment encountered, a new segment
is created containing only the records that were marked in the previous phase. This segment
is then saved in a new set of TAR files. The set of new TAR files is the result of the garbage
collection process. 
> # Add the new TAR files to the system. The system will append the new TAR files to the
segment store. The segments in these TAR files will shadow the ones in older TAR files.
> # Remove TAR files from the old generation. It is safe to do so because the new set of
TAR files are currently shadowing the initial set of TAR files.
> While the garbage collection process is running, the system can work as usual by starting
a fresh TAR file. The result of the garbage collection is made visible atomically only at
the end, when the new TAR files are integrated into the running system.

This message was sent by Atlassian JIRA

View raw message