avro-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From b...@apache.org
Subject [16/35] avro git commit: AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks.
Date Sat, 05 Nov 2016 20:20:33 GMT
AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks.

Java and other implementations require this CRC32 checksum of the
uncompressed content in order to read the data. This implements the
checksum, with backward-compatibility for files written by old versions
of avro-ruby. If the checksum doesn't match or if decompression fails
with the last 4 bytes removed as the checksum, avro-ruby will decompress
the incoming bytes and pass them on assuming that the file is from an
old reader.

Closes #121.


Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/ba848e21
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/ba848e21
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/ba848e21

Branch: refs/heads/branch-1.8
Commit: ba848e21ab275d81b7f33bbcf124efe7f67822bc
Parents: 79a6d8d
Author: Ryan Blue <blue@apache.org>
Authored: Sat Sep 10 15:57:30 2016 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Sat Nov 5 13:15:47 2016 -0700

----------------------------------------------------------------------
 CHANGES.txt                     |  3 +++
 lang/ruby/lib/avro/data_file.rb | 19 ++++++++++++++++++-
 lang/ruby/test/test_io.rb       | 11 +++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/avro/blob/ba848e21/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index e5ba5de..a13442d 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -48,6 +48,9 @@ Trunk (not yet released)
     AVRO-1908: Fix TestSpecificCompiler reference to private method.
     (blue)
 
+    AVRO-1873: Ruby: Add CRC32 checksum to Snappy-compressed blocks.
+    (blue)
+
 Avro 1.8.1 (14 May 2016)
 
   INCOMPATIBLE CHANGES

http://git-wip-us.apache.org/repos/asf/avro/blob/ba848e21/lang/ruby/lib/avro/data_file.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/lib/avro/data_file.rb b/lang/ruby/lib/avro/data_file.rb
index c27c2dc..e465055 100644
--- a/lang/ruby/lib/avro/data_file.rb
+++ b/lang/ruby/lib/avro/data_file.rb
@@ -338,12 +338,29 @@ module Avro
 
       def decompress(data)
         load_snappy!
+        crc32 = data.slice(-4..-1).unpack('N').first
+        uncompressed = Snappy.inflate(data.slice(0..-5))
+
+        if crc32 == Zlib.crc32(uncompressed)
+          uncompressed
+        else
+          # older versions of avro-ruby didn't write the checksum, so if it
+          # doesn't match this must assume that it wasn't there and return
+          # the entire payload uncompressed.
+          Snappy.inflate(data)
+        end
+      rescue Snappy::Error
+        # older versions of avro-ruby didn't write the checksum, so removing
+        # the last 4 bytes may cause Snappy to fail. recover by assuming the
+        # payload is from an older file and uncompress the entire buffer.
         Snappy.inflate(data)
       end
 
       def compress(data)
         load_snappy!
-        Snappy.deflate(data)
+        crc32 = Zlib.crc32(data)
+        compressed = Snappy.deflate(data)
+        [compressed, crc32].pack('a*N')
       end
 
       private

http://git-wip-us.apache.org/repos/asf/avro/blob/ba848e21/lang/ruby/test/test_io.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/test/test_io.rb b/lang/ruby/test/test_io.rb
index 153cb94..09d725d 100644
--- a/lang/ruby/test/test_io.rb
+++ b/lang/ruby/test/test_io.rb
@@ -340,6 +340,17 @@ EOS
       assert_equal(incorrect, 0)
     end
   end
+
+  def test_snappy_backward_compat
+    # a snappy-compressed block payload without the checksum
+    # this has no back-references, just one literal so the last 9
+    # bytes are the uncompressed payload.
+    old_snappy_bytes = "\x09\x20\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+    uncompressed_bytes = "\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+    snappy = Avro::DataFile::SnappyCodec.new
+    assert_equal(uncompressed_bytes, snappy.decompress(old_snappy_bytes))
+  end
+
   private
 
   def check_no_default(schema_json)


Mime
View raw message