jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chetan Mehrotra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4810) FileDataStore: support SHA-2
Date Thu, 15 Sep 2016 08:50:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492777#comment-15492777

Chetan Mehrotra commented on OAK-4810:

bq. I think default for writing (if not configured explicitly) could still be SHA-1.

The change can be made anytime. It should not affect any other part much. So default value
can be simply switched to SHA-256

Once a binary is added by any digest method we do not need the method details while doing
a read as that would be purely on the basis of id. Still it would be good to encode the algo
in the id which is passed back to NodeStore

> FileDataStore: support SHA-2
> ----------------------------
>                 Key: OAK-4810
>                 URL: https://issues.apache.org/jira/browse/OAK-4810
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob
>            Reporter: Thomas Mueller
> The FileDataStore currently uses SHA-1, but that algorithm is deprecated. We should support
other algorithms as well (mainly SHA-256).
> Migration should be painless (no long downtime). I think default for writing (if not
configured explicitly) could still be SHA-1. But when reading, SHA-256 should also be supported
(depending on the identifier). That way, the new Oak version for all repositories (in a cluster
+ shared datastore) can be installed "slowly".
> After all repositories are running with the new Oak version, the configuration for SHA-256
can be enabled. That way, SHA-256 is used for new binaries. Both SHA-1 and SHA-256 are supported
for reading.
> One potential downside is deduplication would suffer a bit if a new Blob with same content
is added again as digest based match would fail. That can be mitigated by computing 2 types
of digest if need arises. The downsides are some additional file operations and CPU, and slower
migration to SHA-256.
> Some other open questions: 
> * While we are at it, it might makes senses to additionally support SHA-3 and other algorithms
(make it configurable). But the length of the identifier alone might then not be enough information
to know what algorithm is used, so maybe add a prefix.
> * The number of subdirectory levels: should we keep it as is, or should we reduce it
(for example one level less).

This message was sent by Atlassian JIRA

View raw message