beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marian Dvorsky (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-3874) Switch AvroIO sink default codec to Snappy
Date Sun, 18 Mar 2018 15:43:00 GMT
Marian Dvorsky created BEAM-3874:
------------------------------------

             Summary: Switch AvroIO sink default codec to Snappy
                 Key: BEAM-3874
                 URL: https://issues.apache.org/jira/browse/BEAM-3874
             Project: Beam
          Issue Type: Improvement
          Components: io-java-avro
            Reporter: Marian Dvorsky
            Assignee: Eugene Kirpichov


AvroIO currently uses [CodecFactory|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%2523b8636ed8a0357a3a3806fb8ad152a1e38d3b4fa39a6a66d189c040aee9687823&gsn=CodecFactory&ct=xref_usages].[deflateCodec|https://cs.corp.google.com/piper///depot/google3/third_party/java_src/apache_beam/project_root/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java?l=851&gs=kythe%253A%252F%252Fgoogle3%253Flang%253Djava%253Fpath%253Dorg.apache.avro.file.CodecFactory%25239fc62def2276bb77cc0f71b21660540e246046da139bfed9b0f33c7f8dbb4550&gsn=deflateCodec&ct=xref_usages](6)
as the default codec for writes.

That compresses well, but is quite expensive.

Snappy codec offers sparser, but much faster compression, and is typically a better CPU/storage
tradeoff except for very long lived files. 

We should consider switching the default to Snappy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message