hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicholas Carlini (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6837) Support for LZMA compression
Date Thu, 24 Jun 2010 18:01:06 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882250#action_12882250
] 

Nicholas Carlini commented on HADOOP-6837:
------------------------------------------

The Java code from the SDK hasn't been updated since version 4.61 (which is as of 23 November,
2008), so support for LZMA2 would need to rely on C code, or be ported to Java. 

The compression ratios of LZMA and LZMA2 are nearly identical (+/- .01% from the tests I did).
It does look like LZMA2 is block based and is splittable, so that would be a major plus for
it.

On the differences between LZMA and LZMA2:

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LZMA2 is an extension on top of the
original LZMA. LZMA2 uses
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; LZMA internally, but adds support for
flushing the encoder,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; uncompressed chunks, eases stateful
decoder implementations,
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; and improves support for multithreading.

http://tukaani.org/xz/xz-file-format.txt

I did have to add support for flushing the encoder to the Java code (flushing the encoder
still produces valid lzma-compressed output).

> Support for LZMA compression
> ----------------------------
>
>                 Key: HADOOP-6837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6837
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Nicholas Carlini
>            Assignee: Nicholas Carlini
>         Attachments: HADOOP-6837-lzma-java-20100623.patch
>
>
> Add support for LZMA (http://www.7-zip.org/sdk.html) compression, which generally achieves
higher compression ratios than both gzip and bzip2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message