jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-3536) Indexing with Lucene and copy-on-read generate too much garbage in the BlobStore
Date Wed, 20 Jan 2016 15:13:39 GMT

     [ https://issues.apache.org/jira/browse/OAK-3536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Francesco Mari updated OAK-3536:
    Fix Version/s:     (was: 1.4)

> Indexing with Lucene and copy-on-read generate too much garbage in the BlobStore
> --------------------------------------------------------------------------------
>                 Key: OAK-3536
>                 URL: https://issues.apache.org/jira/browse/OAK-3536
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>    Affects Versions: 1.3.9
>            Reporter: Francesco Mari
>            Priority: Critical
>             Fix For: 1.6
> The copy-on-read strategy when using Lucene indexing performs too many copies of the
index files from the filesystem to the repository. Every copy discards the previously stored
binary, that sits there as garbage until the binary garbage collection kicks in. When the
load on the system is particularly intense, this behaviour makes the repository grow at an
unreasonable high pace. 
> I spotted this on a system where some content is generated every day at a specific time.
The content generation process creates approx. 6 millions new nodes, where each node contains
5 properties with small string, random values. Nodes were saved in batches of 1000 nodes each.
At the end of the content generation process, the nodes are deleted to deliberately generate
garbage in the Segment Store. This is part of a testing effort to assess the efficiency of
the online compaction.
> I was never able to complete the tests because the system run out of disk space due to
a lot of unused binary values. When debugging the system, on a 400 GB (full) disk, the segments
containing nodes and property values occupied approx. 3 GB. The rest of the space was occupied
by binary values in form of bulk segments.

This message was sent by Atlassian JIRA

View raw message