jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig (JIRA) <j...@apache.org>
Subject [jira] [Commented] (OAK-5910) Reduce copying of data when reading mmapped records
Date Thu, 09 Mar 2017 09:02:38 GMT

    [ https://issues.apache.org/jira/browse/OAK-5910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902714#comment-15902714

Michael Dürig commented on OAK-5910:

I think it makes perfect sense exploring this avenue. Apart from the potential performance
gain it will also remove some memory pressure and the pressure on gc that goes along with
it. It is probably hard to measure this with our (micro) benchmarks as its effect will likely
only show once there is a certain load on a system (many tar files, fragmentation etc). Still
it would be good to run more of our benchmarks for a longer time to guard us against "surprises".
E.g. like in OAK-5853, where an innocent looking change did not have the desired effect in
one area but had an adverse effect in another one. 

Finally do we actually know why the code currently copies the data in these places? Is this
a leftover or was there an intention behind this?

> Reduce copying of data when reading mmapped records
> ---------------------------------------------------
>                 Key: OAK-5910
>                 URL: https://issues.apache.org/jira/browse/OAK-5910
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-tar
>            Reporter: Alex Parvulescu
>            Assignee: Alex Parvulescu
>             Fix For: 1.8
>         Attachments: OAK-5910.patch
> The idea is to reduce the amount of extra byte buffers created when reading mmapped records,
if possible pushing the ByteBuffer all the way to the consumer.
> For example reading a String from a Segment right now means first reading the bytes of
of the record into a byte array, then creating a string with an encoding (which behind the
scenes will copy the byte array again and run it through the decoder). An alternative is to
call {{decode}} on the Charset and pass in the ByteBuffer, skipping the intermediate operations.
> There are a few cases of this I included in the patch, but there may be others (like
the {{SegmentStream}} which needs a full rewrite).
> Interested in what others think of this!

This message was sent by Atlassian JIRA

View raw message