jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-3122) Direct copy of the chunked data between blob stores
Date Tue, 21 Jul 2015 11:28:04 GMT

    [ https://issues.apache.org/jira/browse/OAK-3122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634982#comment-14634982

Chetan Mehrotra commented on OAK-3122:

bq. It seems to me we need some sort of an ID translation layer to support AbstractBlobStore
-> DataStore cases.

Yeah looks like that needs to be done. Another possible way could have been to use content
hash of blobId itself but then you have to be ready to deal with potential collision. And
it goes against the content hash based design of DataStores. So we can then keep it simple
# Get all the BlobIds by using {{ReferenceCollector}} as being done in Blob GC
# Use the blobIds to read input stream {{blobIdSource}}. Copy that stream to target BlobStore
and get {{blobIdTarget}}
# Keep the mapping of {{blobIdSource}} -> {{blobIdTarget}} in MVStore as a persistent map
# Once migration is done have a way to store the mapping file
# At runtime target BlobStore has to distinguish the two different types of BlobIds. New binary
would get blobId of target BlobStore. If the call is for old blobId then it looks up in mapping
and then fetch from mapped one

In the end time saved might not be worth all this effort and we are better off performing
a full NodeStore to NodeStore cloning ;)

> Direct copy of the chunked data between blob stores
> ---------------------------------------------------
>                 Key: OAK-3122
>                 URL: https://issues.apache.org/jira/browse/OAK-3122
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: core, mongomk, upgrade
>            Reporter: Tomek Rękawek
>             Fix For: 1.4
> It could be useful to have a tool that allows to copy blob chunks directly between different
stores, so users can quickly migrate their data, without a need to touch the node store, consolidate
binaries, etc.
> Such tool should have direct access to the methods operating on the binary blocks, implemented
in the {{AbstractBlobStore}} and its subtypes:
> {code}
> void storeBlock(byte[] digest, int level, byte[] data);
> byte[] readBlockFromBackend(BlockId blockId);
> Iterator<String> getAllChunkIds(final long maxLastModifiedTime);
> {code}
> My proposal is to create a {{ChunkedBlobStore}} interface containing these methods, which
can be implemented by {{FileBlobStore}} and {{MongoBlobStore}}.
> Then we can enumerate all chunk ids, read the underlying blocks from source blob store
and save them in the destination.

This message was sent by Atlassian JIRA

View raw message