jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Saurabh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-6269) Support non chunk storage in OakDirectory
Date Thu, 28 Sep 2017 06:34:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183612#comment-16183612
] 

Vikas Saurabh edited comment on OAK-6269 at 9/28/17 6:33 AM:
-------------------------------------------------------------

bq. Would be good to have OakDirectory#largeFile run with streaming mode.
So, current {{BlackHoleBlobStore}} couldn't accomodate single large blob as it couldn't allocate
sufficient space on heap (maybe, I could've played with Xmx to make it work... but, that doesn't
seem reasonable). Anyway, I adapted that store to not keep blobs in mem but hash calculation
is still involved - the test passes but takes more time.

bq. Or better parametrize the OakDirectory test for both modes
Wrt, to previous point and some custom blob size/setting blob size in writing tests don't
fit the bill in streaming case
_UPDATE_: Renamed {{OakDirectoryTest}} to {{OakDirectoryTestBase}} and created 2 impl - {{StreamingOakDirectoryTest}}
and {{ChunkedOakDirectoryTest}} which have relevant bit changed between 2 cases. Chunked one
also has 2 tests which weren't relevant for streaming case.
That said, all test pass (though largeFile takes time :-/) with a few sensible changes.

Have taken care of other points in that comment \[0].

bq. Can we have this as OSGi config similar to support for blobSize
-I couldn't find how blob size is configurable using Osgi... that said, it'd probably to far
down the tracks to make enable/disable from {{LuceneIndexProviderService}} to {{BufferedOakDirectory}}.
The current logic is that BufferedOakDirectory sets up backing OakDirectory for streaming
if {{\-Doak.lucene.enableSingleBlobIndexFiles}} is set to true. Currently, default is also
true.-
_UPDATE_: Now this is exposed as a osgi prop for LuceneIndexPropertyService that set a static
boolean of BufferedOakDirectory.

Btw, with default set to true, all oak-lucene tests pass.

\[0]: Current work at https://github.com/catholicon/jackrabbit-oak/compare/trunk...catholicon:OAK-6269-non-chunking-OakDirectory?expand=1



was (Author: catholicon):
bq. Would be good to have OakDirectory#largeFile run with streaming mode.
So, current {{BlackHoleBlobStore}} couldn't accomodate single large blob as it couldn't allocate
sufficient space on heap (maybe, I could've played with Xmx to make it work... but, that doesn't
seem reasonable). Anyway, I adapted that store to not keep blobs in mem but hash calculation
is still involved - the test passes but takes more time.

bq. Or better parametrize the OakDirectory test for both modes
Wrt, to previous point and some custom blob size/setting blob size in writing tests don't
fit the bill in streaming case

That said, all test pass (though largeFile takes time :-/) with a few sensible changes.

Have taken care of other points in that comment \[0].

bq. Can we have this as OSGi config similar to support for blobSize
-I couldn't find how blob size is configurable using Osgi... that said, it'd probably to far
down the tracks to make enable/disable from {{LuceneIndexProviderService}} to {{BufferedOakDirectory}}.
The current logic is that BufferedOakDirectory sets up backing OakDirectory for streaming
if {{\-Doak.lucene.enableSingleBlobIndexFiles}} is set to true. Currently, default is also
true.-
_UPDATE_: Now this is exposed as a osgi prop for LuceneIndexPropertyService that set a static
boolean of BufferedOakDirectory.

Btw, with default set to true, all oak-lucene tests pass.

\[0]: Current work at https://github.com/catholicon/jackrabbit-oak/compare/trunk...catholicon:OAK-6269-non-chunking-OakDirectory?expand=1


> Support non chunk storage in OakDirectory
> -----------------------------------------
>
>                 Key: OAK-6269
>                 URL: https://issues.apache.org/jira/browse/OAK-6269
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: lucene
>            Reporter: Chetan Mehrotra
>            Assignee: Vikas Saurabh
>             Fix For: 1.8
>
>         Attachments: 0001-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch, 0002-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch,
0003-OAK-6269-Support-non-chunk-storage-in-OakDirectory.patch
>
>
> Logging this issue based on offline discussion with [~catholicon].
> Currently OakDirectory stores files in chunk of 1 MB each. So a 1 GB file would be stored
in 1000+ chunks of 1 MB.
> This design was done to support direct usage of OakDirectory with Lucene as Lucene makes
use of random io. Chunked storage allows it to seek to random position quickly. If the files
are stored as Blobs then its only possible to access via streaming which would be slow
> As most setup now use copy-on-read and copy-on-write support and rely on local copy of
index we can have an implementation which stores the file as single blob.
> *Pros*
> * Quite a bit of reduction in number of small blobs stored in BlobStore. Which should
reduce the GC time specially for S3 
> * Reduced overhead of storing a single file in repository. Instead of array of 1k blobids
we would be stored a single blobid
> * Potential improvement in IO cost as file can be read in one connection and uploaded
in one.
> *Cons*
> It would not be possible to use OakDirectory directly (or would be very slow) and we
would always need to do local copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message