hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-10047) Add a directbuffer Decompressor API to hadoop
Date Wed, 06 Nov 2013 19:24:18 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Gopal V updated HADOOP-10047:

    Attachment: decompress-benchmark.tgz

A multi-threaded decompress benchmark comparing Deflater vs ZlibDirect

I build my hadoop-trunk branch with version 3.0.0-COMPRESS

$ mvn package -Dhadoop.version=3.0.0-COMPRESS
$ LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/ java -jar  target/compress-benchmark-1.0-SNAPSHOT.jar
-n 200 -s 4096 -p 4

this spawns 4 threads and tries to decompress 200 tasks each of 4mb raw data in an executor
& prints out the sum of all System.currentMillis() spent.

And we get the following nearly linear trend with both methods

|| 4mb x || before || after ||

> Add a directbuffer Decompressor API to hadoop
> ---------------------------------------------
>                 Key: HADOOP-10047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10047
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: io
>    Affects Versions: 2.3.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>              Labels: compression
>             Fix For: 3.0.0
>         Attachments: DirectCompressor.html, DirectDecompressor.html, HADOOP-10047-WIP.patch,
HADOOP-10047-final.patch, HADOOP-10047-redo-WIP.patch, HADOOP-10047-trunk.patch, HADOOP-10047-with-tests.patch,
> With the Zero-Copy reads in HDFS (HDFS-5260), it becomes important to perform all I/O
operations without copying data into byte[] buffers or other buffers which wrap over them.
> This is a proposal for adding a DirectDecompressor interface to the io.compress, to indicate
codecs which want to surface the direct buffer layer upwards.
> The implementation should work with direct heap/mmap buffers and cannot assume .array()

This message was sent by Atlassian JIRA

View raw message