tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2576) Add application/zstd detection and parser
Date Tue, 06 Mar 2018 21:45:06 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388577#comment-16388577
] 

Hudson commented on TIKA-2576:
------------------------------

FAILURE: Integrated in Jenkins build Tika-trunk #1446 (See [https://builds.apache.org/job/Tika-trunk/1446/])
TIKA-2576 -- Upgrade commons compress and add detection and parsing of (tallison: [https://github.com/apache/tika/commit/3701f2d340ee56af10aa1b6cc44375d71b50bb52])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (add) tika-parsers/src/test/resources/test-documents/testZSTD.zstd
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/pkg/CompressorParserTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/pkg/CompressorParser.java
* (edit) tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
* (edit) tika-parent/pom.xml
* (edit) tika-parsers/pom.xml
* (edit) CHANGES.txt


> Add application/zstd detection and parser
> -----------------------------------------
>
>                 Key: TIKA-2576
>                 URL: https://issues.apache.org/jira/browse/TIKA-2576
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector, parser
>            Reporter: Andreas Meier
>            Priority: Minor
>         Attachments: huffman-compressed-larger, huffmann-compressed-larger-result.txt
>
>
> The IETF is currently checking the specification of Zstandard compression and the application/zstd
Media Type: [https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html|https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html]
> As soon as the MediaType application/zstd is set as standard the Media Type shall be
implemented.
> Possible mime-detection for tika-mimetypes.xml (second comment has to be changed when
the standard is final):
> {code:xml}
>   <mime-type type="application/zstd">
>     <_comment>https://en.wikipedia.org/wiki/Zstandard</_comment>
>     <_comment>https://tools.ietf.org/id/draft-kucherawy-dispatch-zstd-01.html</_comment>
>     <magic priority="50">
>       <match value="0xFD2FB528" type="little32" offset="0"/>
>     </magic>
>     <glob pattern="*.zstd"/>
>   </mime-type>
> {code}
> commons-compress version 1.16 and later provide a compressor and decompressor for the
algorithm, based on com.github.luben zstd-jni [https://github.com/luben/zstd-jni|https://github.com/luben/zstd-jni]
> Attached sampe zstd file (huffman-compressed-larger) and the result after decompressing
it.
> Decompression was done with commons-compress 1.16.1 and zstd-jni 1.3.3-3
> {code:xml}
> <dependency>
>   <groupId>org.apache.commons</groupId>
>   <artifactId>commons-compress</artifactId>
>   <version>1.16.1</version>
> </dependency>
> <dependency>
>   <groupId>com.github.luben</groupId>
>   <artifactId>zstd-jni</artifactId>
>   <version>1.3.3-3</version>
> </dependency>
> {code}
> Regards
> Andreas



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message