jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-2808) Active deletion of 'deleted' Lucene index files from DataStore without relying on full scale Blob GC
Date Thu, 23 Jul 2015 11:49:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-2808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638688#comment-14638688

Thomas Mueller commented on OAK-2808:

[~mmarth] and me have a new idea on how to solve this (and possibly other) problems, but in
a slightly different way. It is to add a new methods to the DataStore and BlobStore. Currently
we have "String writeBlob(InputStream in)", and we would add "String writeBlob(String type,
InputStream in)". Depending on the type, it would write to the default location / directory
/ datastore backend, or to another one. The returned identifier may or may not be different
depending on the type. Also, we might want to add a similar "readBlob" method with the type
parameter. This would be a more generic solution that can solve this problem here plus other
problems, and might be less risky to implement (less risky to delete the wrong files).

I will open a new issue for this and link it here.

> 	Active deletion of 'deleted' Lucene index files from DataStore without relying on full
scale Blob GC
> -----------------------------------------------------------------------------------------------------
>                 Key: OAK-2808
>                 URL: https://issues.apache.org/jira/browse/OAK-2808
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>              Labels: datastore, performance
>             Fix For: 1.3.4
>         Attachments: copyonread-stats.png
> With storing of Lucene index files within DataStore our usage pattern
> of DataStore has changed between JR2 and Oak.
> With JR2 the writes were mostly application based i.e. if application
> stores a pdf/image file then that would be stored in DataStore. JR2 by
> default would not write stuff to DataStore. Further in deployment
> where large number of binary content is present then systems tend to
> share the DataStore to avoid duplication of storage. In such cases
> running Blob GC is a non trivial task as it involves a manual step and
> coordination across multiple deployments. Due to this systems tend to
> delay frequency of GC
> Now with Oak apart from application the Oak system itself *actively*
> uses the DataStore to store the index files for Lucene and there the
> churn might be much higher i.e. frequency of creation and deletion of
> index file is lot higher. This would accelerate the rate of garbage
> generation and thus put lot more pressure on the DataStore storage
> requirements.
> Discussion thread http://markmail.org/thread/iybd3eq2bh372zrl

This message was sent by Atlassian JIRA

View raw message