storm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bo...@apache.org
Subject [1/3] storm git commit: STORM-1138: Storm-hdfs README should be updated with Avro Bolt information
Date Fri, 06 Nov 2015 18:27:10 GMT
Repository: storm
Updated Branches:
  refs/heads/master b57da7b9e -> 4fe62b2ca


STORM-1138: Storm-hdfs README should be updated with Avro Bolt information


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/507f295f
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/507f295f
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/507f295f

Branch: refs/heads/master
Commit: 507f295fc800e61486c55579ff05a13310494424
Parents: de579be
Author: Aaron Dossett <aaron.dossett@target.com>
Authored: Mon Nov 2 08:18:58 2015 -0600
Committer: Aaron Dossett <aaron.dossett@target.com>
Committed: Mon Nov 2 08:18:58 2015 -0600

----------------------------------------------------------------------
 external/storm-hdfs/README.md | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/507f295f/external/storm-hdfs/README.md
----------------------------------------------------------------------
diff --git a/external/storm-hdfs/README.md b/external/storm-hdfs/README.md
index e1f2b28..7819f81 100644
--- a/external/storm-hdfs/README.md
+++ b/external/storm-hdfs/README.md
@@ -277,6 +277,39 @@ public interface SequenceFormat extends Serializable {
 }
 ```
 
+## Support for Avro Files
+
+The `org.apache.storm.hdfs.bolt.AvroGenericRecordBolt` class allows you to write Avro objects
directly to HDFS:
+ 
+```java
+        // sync the filesystem after every 1k tuples
+        SyncPolicy syncPolicy = new CountSyncPolicy(1000);
+
+        // rotate files when they reach 5MB
+        FileRotationPolicy rotationPolicy = new FileSizeRotationPolicy(5.0f, Units.MB);
+
+        FileNameFormat fileNameFormat = new DefaultFileNameFormat()
+                .withExtension(".avro")
+                .withPath("/data/");
+
+        // create sequence format instance.
+        DefaultSequenceFormat format = new DefaultSequenceFormat("timestamp", "sentence");
+
+        SequenceFileBolt bolt = new SequenceFileBolt()
+                .withFsUrl("hdfs://localhost:54310")
+                .withFileNameFormat(fileNameFormat)
+                .withSchemaAsString(schema)
+                .withRotationPolicy(rotationPolicy)
+                .withSyncPolicy(syncPolicy);
+```
+
+The setup is very similar to the `SequenceFileBolt` example above.  The key difference is
that instead of specifying a
+`SequenceFormat` you must provide a string representation of an Avro schema through the `withSchemaAsString()`
method.
+An `org.apache.avro.Schema` object cannot be directly provided since it does not implement
`Serializable`.
+
+The AvroGenericRecordBolt expects to receive tuples containing an Avro GenericRecord that
conforms to the provided
+schema.
+
 ## Trident API
 storm-hdfs also includes a Trident `state` implementation for writing data to HDFS, with
an API that closely mirrors
 that of the bolts.


Mime
View raw message