flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5944) Flink should support reading Snappy Files
Date Sun, 24 Sep 2017 07:52:02 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178113#comment-16178113
] 

ASF GitHub Bot commented on FLINK-5944:
---------------------------------------

Github user haohui commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4683#discussion_r140649772
  
    --- Diff: flink-core/pom.xml ---
    @@ -52,6 +52,12 @@ under the License.
     			<artifactId>flink-shaded-asm</artifactId>
     		</dependency>
     
    +		<dependency>
    +			<groupId>org.apache.flink</groupId>
    +			<artifactId>flink-shaded-hadoop2</artifactId>
    +			<version>${project.version}</version>
    +		</dependency>
    --- End diff --
    
    Including hadoop as a dependency in flink-core can be problematic for a number of downstream
projects.
    
    I wonder what is the exact difference between the Hadoop and vanilla snappy codec? Is
it just due to the fact that there are additional framings in the snappy codec in Hadoop?
    
    



> Flink should support reading Snappy Files
> -----------------------------------------
>
>                 Key: FLINK-5944
>                 URL: https://issues.apache.org/jira/browse/FLINK-5944
>             Project: Flink
>          Issue Type: New Feature
>          Components: Batch Connectors and Input/Output Formats
>            Reporter: Ilya Ganelin
>            Assignee: Mikhail Lipkovich
>              Labels: features
>
> Snappy is an extremely performant compression format that's widely used offering fast
decompression/compression. 
> This can be easily implemented by creating a SnappyInflaterInputStreamFactory and updating
the initDefaultInflateInputStreamFactories in FileInputFormat.
> Flink already includes the Snappy dependency in the project. 
> There is a minor gotcha in this. If we wish to use this with Hadoop, then we must provide
two separate implementations since Hadoop uses a different version of the snappy format than
Snappy Java (which is the xerial/snappy included in Flink). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message