jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-6269) Support non chunk storage in OakDirectory
Date Thu, 28 Sep 2017 06:37:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183745#comment-16183745

Vikas Saurabh commented on OAK-6269:

[~chetanm], I'd try to see what I can do for
bq. This may show better performance for S3DataStore
Other than that, I think the planned work for this feature is done at my end \[0] (currently,
this is enable by default).

\[0]: https://github.com/catholicon/jackrabbit-oak/compare/trunk...catholicon:OAK-6269-non-chunking-OakDirectory?expand=1

> Support non chunk storage in OakDirectory
> -----------------------------------------
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>         Attachments: 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch,
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file would be stored
in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as Lucene makes
use of random io. Chunked storage allows it to seek to random position quickly. If the files
are stored as Blobs then its only possible to access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on local copy of
index we can have an implementation which stores the file as single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. Which should
reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array of 1k blobids
we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and uploaded
in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) and we
would always need to do local copy.

This message was sent by Atlassian JIRA

View raw message