From b...@apache.org
Subject [10/35] avro git commit: AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)
Date Sat, 05 Nov 2016 20:20:27 GMT
AVRO-1704: Add single-record encoding spec. (Contributed by Niels Basjes)

Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/1c9ef72b
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/1c9ef72b
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/1c9ef72b

Branch: refs/heads/branch-1.8
Commit: 1c9ef72b4b7f3b34d16748c7161ec26b193e7299
Parents: b550367
Author: Ryan Blue <blue@apache.org>
Authored: Sun Jul 24 15:47:36 2016 -0700
Committer: Ryan Blue <blue@apache.org>
Committed: Sat Nov 5 13:15:08 2016 -0700

 CHANGES.txt                    |  2 ++
 doc/src/content/xdocs/spec.xml | 36 ++++++++++++++++++++++++++++++++----
 2 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 0490e86..537b2b2 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -8,6 +8,8 @@ Trunk (not yet released)
     AVRO-1704: Java: Add support for single-message encoding. (blue)
+    AVRO-1704: Spec: Add single-message encoding format. (Niels Basjes via blue)

diff --git a/doc/src/content/xdocs/spec.xml b/doc/src/content/xdocs/spec.xml
index ec1f199..917d314 100644
--- a/doc/src/content/xdocs/spec.xml
+++ b/doc/src/content/xdocs/spec.xml
@@ -487,18 +487,18 @@
               value, followed by that many key/value pairs.  A block
               with count zero indicates the end of the map.  Each item
               is encoded per the map's value schema.</p>
             <p>If a block's count is negative, its absolute value is used,
               and the count is followed immediately by a <code>long</code>
               block <em>size</em> indicating the number of bytes in the
               block.  This block size permits fast skipping through data,
               e.g., when projecting a record to a subset of its fields.</p>
             <p>The blocked representation permits one to read and write
               maps larger than can be buffered in memory, since one can
               start writing items without knowing the full length of the
           <section id="union_encoding">
@@ -569,6 +569,34 @@
+      <section id="single_object_encoding">
+        <title>Single-object encoding</title>
+        <p>In some situations a single Avro serialized object is to be stored for a
+        longer period of time. One very common example is storing Avro records
+        for several weeks in an <a href="http://kafka.apache.org/">Apache Kafka</a>
+        <p>In the period after a schema change this persistance system will contain
+        that have been written with different schemas. So the need arises to know which schema
+        was used to write a record to support schema evolution correctly.
+        In most cases the schema itself is too large to include in the message,
+        so this binary wrapper format supports the use case more effectively.</p>
+        <section id="single_object_encoding_spec">
+          <title>Single object encoding specification</title>
+          <p>Single Avro objects are encoded as follows:</p>
+          <ol>
+            <li>A two-byte marker, <code>C3 01</code>, to show that the
message is Avro and uses this single-record format (version 1).</li>
+            <li>The 8-byte little-endian CRC-64-AVRO <a href="#schema_fingerprints">fingerprint</a>
of the object's schema</li>
+            <li>The Avro object encoded using <a href="#binary_encoding">Avro's
binary encoding</a></li>
+          </ol>
+        </section>
+        <p>Implementations use the 2-byte marker to determine whether a payload is
+          This check helps avoid expensive lookups that resolve the schema from a
+          fingerprint, when the message is not an encoded Avro payload.</p>
+      </section>
     <section id="order">
@@ -1237,7 +1265,7 @@
-      <section>
+      <section id="schema_fingerprints">
         <title>Schema Fingerprints</title>
         <p>"[A] fingerprinting algorithm is a procedure that maps an

