jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-4631) Simplify the format of segments and serialized records
Date Thu, 15 Sep 2016 08:14:20 GMT

    [ https://issues.apache.org/jira/browse/OAK-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492715#comment-15492715

Francesco Mari commented on OAK-4631:

It's also interesting to take the average of the values above, because it helps putting these
information in perspective.

- source, Oak 1.0
135  KB   per data segment
52   byte per map
0.28 byte per list
7    byte per template
5    byte per node
- upgraded instance, pre OAK-4631
33 KB   per data segment
46 byte per map
12 byte per list
7  byte per template
4  byte per node
- upgraded instance, post OAK-4631
251 KB   per data segment
182 byte per map
58  byte per list
22  byte per template
35  byte per node

Records got bigger, that's undeniable. But as a consequence of this change records are more
easily parseable, segments are better utilised and 54% less segments are needed to store the
same data. Less segments means a smaller size of book-keeping data structures used throughout
the Segment Store, especially when it comes to compaction. This change traded space for simplicity,
and I think there is some value in that.

> Simplify the format of segments and serialized records
> ------------------------------------------------------
>                 Key: OAK-4631
>                 URL: https://issues.apache.org/jira/browse/OAK-4631
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>             Fix For: Segment Tar 0.0.10
>         Attachments: OAK-4631-01.patch, OAK-4631-02.patch, OAK-4631-03.patch, OAK-4631-04.patch
> As discussed in [this thread|http://markmail.org/thread/3oxp6ydboyefr4bg], it might be
beneficial to simplify both the format of the segments and the way record IDs are serialised.
A new strategy needs to be investigated to reach the right compromise between performance,
disk space utilization and simplicity.

This message was sent by Atlassian JIRA

View raw message