jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3122) Direct copy of the chunked data between blob stores
Date Mon, 20 Jul 2015 11:12:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633430#comment-14633430

Chetan Mehrotra commented on OAK-3122:

[~tomek.rekawek] Most people use {{FileDataStore}} instead of {{FileBlobStore}}. So we would
need to support copying data from say {{MongoBlobStore}} to {{FileDataStore}}. 

Couple of points to note
# {{FileDataStore}} stores the whole binary in a single file compared to {{FileBlobStore}}
# Any such migration logic has to ensure that blobId remains same as they are referred to
from various places in NodeStore and it would not be possible to change them easily
# BlobId's in chunked based stores like {{MongoBlobStore}}  or others have variable length
while current {{FileDataStore}} is having a fixed length id of 40 chars
# {{FileDataStore}} in Oak is used via {{DataStoreBlobStore}} wrapper which internally also
encodes the file length in blobId. So when using that blobId's are like _43844ed22d640a114134e5a25550244e8836c00c#28705_
i.e. '<blobId>#<blobLength>'. So similar support still needs to be done

One possible approach can be 
# Get handle to all top level blobIds - {{GarbageCollectableBlobStore#getAllChunkIds}} provides
all ids but we need access to top level one
# Obtain inputstream from given ids and copy that to filesystem. Use the blobId has the name
of the file on FileSystem. Further create the file in same way as done in FileDataStore i.e.
3 level depth
# Modify DataStoreBlobStore to allow extracting length from id and then extract encoded length
in blobId to determine the length

[~tmueller] [~amitjain] Thoughts?

> Direct copy of the chunked data between blob stores
> ---------------------------------------------------
>                 Key: OAK-3122
>                 URL: https://issues.apache.org/jira/browse/OAK-3122
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core, mongomk, upgrade
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
> It could be useful to have a tool that allows to copy blob chunks directly between different
stores, so users can quickly migrate their data, without a need to touch the node store, consolidate
binaries, etc.
> Such tool should have direct access to the methods operating on the binary blocks, implemented
in the {{AbstractBlobStore}} and its subtypes:
> {code}
> void storeBlock(byte[] digest, int level, byte[] data);
> byte[] readBlockFromBackend(BlockId blockId);
> Iterator<String> getAllChunkIds(final long maxLastModifiedTime);
> {code}
> My proposal is to create a {{ChunkedBlobStore}} interface containing these methods, which
can be implemented by {{FileBlobStore}} and {{MongoBlobStore}}.
> Then we can enumerate all chunk ids, read the underlying blocks from source blob store
and save them in the destination.

This message was sent by Atlassian JIRA

View raw message