avro-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cutt...@apache.org
Subject svn commit: r811481 - in /hadoop/avro/trunk: CHANGES.txt src/doc/content/xdocs/spec.xml
Date Fri, 04 Sep 2009 16:40:00 GMT
Author: cutting
Date: Fri Sep  4 16:39:51 2009
New Revision: 811481

URL: http://svn.apache.org/viewvc?rev=811481&view=rev
AVRO-111.  Document sort ordering in specification.


Modified: hadoop/avro/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/CHANGES.txt?rev=811481&r1=811480&r2=811481&view=diff
--- hadoop/avro/trunk/CHANGES.txt (original)
+++ hadoop/avro/trunk/CHANGES.txt Fri Sep  4 16:39:51 2009
@@ -47,6 +47,8 @@
     possible values are "increasing" (the default), "decreasing", and
     "ignore".  (cutting)
+    AVRO-111.  Document sort ordering in the specification. (cutting)
     AVRO-71.  C++: make deserializer more generic.  (Scott Banachowski

Modified: hadoop/avro/trunk/src/doc/content/xdocs/spec.xml
URL: http://svn.apache.org/viewvc/hadoop/avro/trunk/src/doc/content/xdocs/spec.xml?rev=811481&r1=811480&r2=811481&view=diff
--- hadoop/avro/trunk/src/doc/content/xdocs/spec.xml (original)
+++ hadoop/avro/trunk/src/doc/content/xdocs/spec.xml Fri Sep  4 16:39:51 2009
@@ -116,6 +116,12 @@
+		<li><code>order:</code> specifies how this field
+		  impacts sort ordering of this record (optional).
+		  Valid values are "ascending" (the default),
+		  "descending", or "ignore".  For more details on how
+		  this is used, see the the <a href="#order">sort
+		  order</a> section below.</li>
@@ -474,6 +480,65 @@
+    <section id="order">
+      <title>Sort Order</title>
+      <p>Avro defines a standard sort order for data.  This permits
+	data written by one system to be efficiently sorted by another
+	system.  This can be an important optimization, as sort order
+	comparisons are sometimes the most frequent per-object
+	operation.  Note also that Avro binary-encoded data can be
+	efficiently ordered without deserializing it to objects.</p>
+      <p>Data items may only be compared if they have identical
+	schemas.  Pairwise comparisons are implemented recursively
+	with a depth-first, left-to-right traversal of the schema.
+	The first mismatch encountered determines the order of the
+	items.</p>
+      <p>Two items with the same schema are compared according to the
+	following rules.</p>
+      <ul>
+	<li><code>int</code>, <code>long</code>, <code>float</code>
+	  and <code>double</code> data is ordered by ascending numeric
+	  value.</li>
+	<li><code>boolean</code> data is ordered with false before true.</li>
+	<li><code>null</code> data is always equal.</li>
+	<li><code>string</code> data is compared lexicographically.
+	  Note that since UTF-8 is used as the binary encoding of
+	  strings, sorting by bytes and characters is equivalent.</li>
+	<li><code>bytes</code> and <code>fixed</code> data are
+	  compared lexicographically by byte.</li>
+	<li><code>array</code> data is compared lexicographically by
+	  element.</li>
+	<li><code>enum</code> data is ordered by the symbol's position
+	  in the enum schema.  For example, an enum whose symbols are
+	  <code>["z", "a"]</code> would sort <code>"z"</code> values
+	  before <code>"a"</code> values.</li>
+	<li><code>union</code> data is first ordered by the branch
+	  within the union, and, within that, by the type of the
+	  branch.  For example, an <code>["int", "string"]</code>
+	  union would order all int values before all string values,
+	  with the ints and strings themselves ordered as defined
+	  above.</li>
+	<li><code>record</code> data is ordered lexicographically by
+	  field.  If a field specifies that its order is:
+	  <ul>
+	    <li><code>"ascending"</code>, then the order of its values
+	      is unaltered.</li>
+	    <li><code>"ascending"</code>, then the order of its values
+	      is reversed.</li>
+	    <li><code>"ignore"</code>, then its values are ignored
+	      when sorting.</li>
+	  </ul>
+	</li>
+	<li><code>map</code> data may not be compared.  It is an error
+	  to attempt to compare data containing maps unless those maps
+	  are in an <code>"order":"ignore"</code> record field.
+	</li>
+      </ul>
+    </section>
       <title>Object Container Files</title>
       <p>Avro includes a simple object container file format.  A file

View raw message