jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (OAK-2896) Putting many elements into a map results in many small segments.
Date Fri, 22 Jul 2016 14:11:23 GMT

     [ https://issues.apache.org/jira/browse/OAK-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Francesco Mari updated OAK-2896:
--------------------------------
    Fix Version/s:     (was: Segment Tar 0.0.6)
                   Segment Tar 0.0.8

> Putting many elements into a map results in many small segments. 
> -----------------------------------------------------------------
>
>                 Key: OAK-2896
>                 URL: https://issues.apache.org/jira/browse/OAK-2896
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: segment-tar
>            Reporter: Michael Dürig
>            Assignee: Michael Dürig
>            Priority: Critical
>              Labels: performance
>             Fix For: 1.6, Segment Tar 0.0.8
>
>         Attachments: OAK-2896.png, OAK-2896.xlsx, size-dist.png
>
>
> There is an issue with how the HAMT implementation ({{SegmentWriter.writeMap()}} interacts
with the 256 segment references limit when putting many entries into the map: This limit gets
regularly reached once the maps contains about 200k entries. At that points segments get prematurely
flushed resulting in more segments, thus more references and thus even smaller segments. It
is common for segments to be as small as 7k with a tar file containing up to 35k segments.
This is problematic as at this point handling of the segment graph becomes expensive, both
memory and CPU wise. I have seen persisted segment graphs as big as 35M where the usual size
is a couple of ks. 
> As the HAMT map is used for storing children of a node this might have an advert effect
on nodes with many child nodes. 
> The following code can be used to reproduce the issue: 
> {code}
> SegmentWriter writer = new SegmentWriter(segmentStore, getTracker(), V_11);
> MapRecord baseMap = null;
> for (;;) {
>     Map<String, RecordId> map = newHashMap();
>     for (int k = 0; k < 1000; k++) {
>         RecordId stringId = writer.writeString(String.valueOf(rnd.nextLong()));
>         map.put(String.valueOf(rnd.nextLong()), stringId);
>     }
>     Stopwatch w = Stopwatch.createStarted();
>     baseMap = writer.writeMap(baseMap, map);
>     System.out.println(baseMap.size() + " " + w.elapsed());
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message