jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6269) Support non chunk storage in OakDirectory
Date Tue, 26 Sep 2017 10:04:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180553#comment-16180553
] 

Chetan Mehrotra edited comment on OAK-6269 at 9/26/17 10:03 AM:
----------------------------------------------------------------

[~catholicon] Patch looks good. Some feedback below

{noformat}
+    /**
+     * @return if the file implementation supports copying data from {@link DataInput} directly.
+     */
+    boolean supportsCopying();
{noformat}

May be {{supportsCopyFromDataInput}}

{noformat}
+    /** Copy numBytes bytes from input to ourself. */
+    public void copyBytes(DataInput input, long numBytes) throws IOException {
{noformat}

add {{@Override}}

Some other points
* Earlier we saw some bugs due to integer overflow. Would be good to have OakDirectory#largeFile
run with streaming mode. Or better parametrize the OakDirectory test for both modes
* Checking for test covergae of OakStreamingIndexFile do not show coverage for following and
few others. Would be good to have higher test coverage for this class
** {{copyBytes}}
** uniqueKey clause


was (Author: chetanm):
[~catholicon] Patch looks good. Some feedback below

{noformat}
+    /**
+     * @return if the file implementation supports copying data from {@link DataInput} directly.
+     */
+    boolean supportsCopying();
{noformat}

May be {{supportsCopyFromDataInput}}

{noformat}
+    /** Copy numBytes bytes from input to ourself. */
+    public void copyBytes(DataInput input, long numBytes) throws IOException {
{noformat}

add {{@Override}}

Some other points
* Earlier we saw some bugs due to integer overflow. Would be good to have OakDirectory#largeFile
run with streaming mode
* Checking for test covergae of OakStreamingIndexFile do not show coverage for following and
few others. Would be good to have higher test coverage for this class
** {{copyBytes}}
** uniqueKey clause

> Support non chunk storage in OakDirectory
> -----------------------------------------
>
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch,
0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file would be stored
in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as Lucene makes
use of random io. Chunked storage allows it to seek to random position quickly. If the files
are stored as Blobs then its only possible to access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on local copy of
index we can have an implementation which stores the file as single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. Which should
reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array of 1k blobids
we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and uploaded
in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) and we
would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message