storm-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bo...@apache.org
Subject [1/4] storm git commit: STORM-2493: update documents to reflect the changes
Date Mon, 15 May 2017 21:54:52 GMT
Repository: storm
Updated Branches:
  refs/heads/master 7742398d0 -> d4ee957a9


STORM-2493: update documents to reflect the changes


Project: http://git-wip-us.apache.org/repos/asf/storm/repo
Commit: http://git-wip-us.apache.org/repos/asf/storm/commit/e80f9e20
Tree: http://git-wip-us.apache.org/repos/asf/storm/tree/e80f9e20
Diff: http://git-wip-us.apache.org/repos/asf/storm/diff/e80f9e20

Branch: refs/heads/master
Commit: e80f9e20db208555753b93f48c4af175dbe47c6a
Parents: 500691d
Author: vesense <best.wangxin@163.com>
Authored: Thu Apr 27 12:40:57 2017 +0800
Committer: vesense <best.wangxin@163.com>
Committed: Thu Apr 27 12:40:57 2017 +0800

----------------------------------------------------------------------
 docs/Acking-framework-implementation.md | 11 ++--
 docs/DSLs-and-multilang-adapters.md     |  3 +-
 docs/Implementation-docs.md             |  1 +
 docs/index.md                           |  6 +-
 docs/storm-pmml.md                      | 37 +++++++++++
 docs/storm-rocketmq.md                  | 94 ++++++++++++++++++++++++++++
 6 files changed, 145 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/Acking-framework-implementation.md
----------------------------------------------------------------------
diff --git a/docs/Acking-framework-implementation.md b/docs/Acking-framework-implementation.md
index 5ca5d93..f181e98 100644
--- a/docs/Acking-framework-implementation.md
+++ b/docs/Acking-framework-implementation.md
@@ -1,13 +1,16 @@
 ---
+title: Acking framework implementation
 layout: documentation
+documentation: true
 ---
-[Storm's acker](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L28)
tracks completion of each tupletree with a checksum hash: each time a tuple is sent, its value
is XORed into the checksum, and each time a tuple is acked its value is XORed in again. If
all tuples have been successfully acked, the checksum will be zero (the odds that the checksum
will be zero otherwise are vanishingly small).
+
+[Storm's acker]({{page.git-blob-base}}/storm-client/src/jvm/org/apache/storm/daemon/Acker.java)
tracks completion of each tupletree with a checksum hash: each time a tuple is sent, its value
is XORed into the checksum, and each time a tuple is acked its value is XORed in again. If
all tuples have been successfully acked, the checksum will be zero (the odds that the checksum
will be zero otherwise are vanishingly small).
 
 You can read a bit more about the [reliability mechanism](Guaranteeing-message-processing.html#what-is-storms-reliability-api)
elsewhere on the wiki -- this explains the internal details.
 
 ### acker `execute()`
 
-The acker is actually a regular bolt, with its  [execute method](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L36)
defined withing `mk-acker-bolt`.  When a new tupletree is born, the spout sends the XORed
edge-ids of each tuple recipient, which the acker records in its `pending` ledger. Every time
an executor acks a tuple, the acker receives a partial checksum that is the XOR of the tuple's
own edge-id (clearing it from the ledger) and the edge-id of each downstream tuple the executor
emitted (thus entering them into the ledger).
+The acker is actually a regular bolt.  When a new tupletree is born, the spout sends the
XORed edge-ids of each tuple recipient, which the acker records in its `pending` ledger. Every
time an executor acks a tuple, the acker receives a partial checksum that is the XOR of the
tuple's own edge-id (clearing it from the ledger) and the edge-id of each downstream tuple
the executor emitted (thus entering them into the ledger).
 
 This is accomplished as follows.
 
@@ -17,7 +20,7 @@ On a tick tuple, just advance pending tupletree checksums towards death
and retu
 * on ack:  xor the partial checksum into the existing checksum value
 * on fail: just mark it as failed
 
-Next, [put the record](https://github.com/apache/incubator-storm/blob/46c3ba7/storm-core/src/clj/backtype/storm/daemon/acker.clj#L50)),
 into the RotatingMap (thus resetting is countdown to expiry) and take action:
+Next, put the record into the RotatingMap (thus resetting is countdown to expiry) and take
action:
 
 * if the total checksum is zero, the tupletree is complete: remove it from the pending collection
and notify the spout of success
 * if the tupletree has failed, it is also complete:   remove it from the pending collection
and notify the spout of failure
@@ -26,7 +29,7 @@ Finally, pass on an ack of our own.
 
 ### Pending tuples and the `RotatingMap`
 
-The acker stores pending tuples in a [`RotatingMap`](https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/utils/RotatingMap.java#L19),
a simple device used in several places within Storm to efficiently time-expire a process.
+The acker stores pending tuples in a [`RotatingMap`]({{page.git-blob-base}}/storm-client/src/jvm/org/apache/storm/utils/RotatingMap.java),
a simple device used in several places within Storm to efficiently time-expire a process.
 
 The RotatingMap behaves as a HashMap, and offers the same O(1) access guarantees.
 

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/DSLs-and-multilang-adapters.md
----------------------------------------------------------------------
diff --git a/docs/DSLs-and-multilang-adapters.md b/docs/DSLs-and-multilang-adapters.md
index 0ed5450..917b419 100644
--- a/docs/DSLs-and-multilang-adapters.md
+++ b/docs/DSLs-and-multilang-adapters.md
@@ -3,8 +3,9 @@ title: Storm DSLs and Multi-Lang Adapters
 layout: documentation
 documentation: true
 ---
+* [Clojure DSL](Clojure-DSL.html)
 * [Scala DSL](https://github.com/velvia/ScalaStorm)
 * [JRuby DSL](https://github.com/colinsurprenant/redstorm)
-* [Clojure DSL](Clojure-DSL.html)
 * [Storm/Esper integration](https://github.com/tomdz/storm-esper): Streaming SQL on top of
Storm
 * [io-storm](https://github.com/dan-blanchard/io-storm): Perl multilang adapter
+* [FsShelter](https://github.com/Prolucid/FsShelter): F# DSL and runtime with protobuf multilang

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/Implementation-docs.md
----------------------------------------------------------------------
diff --git a/docs/Implementation-docs.md b/docs/Implementation-docs.md
index 9eb91f5..93459f3 100644
--- a/docs/Implementation-docs.md
+++ b/docs/Implementation-docs.md
@@ -8,6 +8,7 @@ This section of the wiki is dedicated to explaining how Storm is implemented.
Yo
 - [Structure of the codebase](Structure-of-the-codebase.html)
 - [Lifecycle of a topology](Lifecycle-of-a-topology.html)
 - [Message passing implementation](Message-passing-implementation.html)
+- [Acking framework implementation](Acking-framework-implementation.html)
 - [Metrics](Metrics.html)
 - [Nimbus HA](nimbus-ha-design.html)
 - [Storm SQL](storm-sql-internal.html)

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index 64cd15d..1b5c621 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -71,7 +71,7 @@ But small change will not affect the user experience. We will notify the
user wh
 
 * [Serialization](Serialization.html)
 * [Common patterns](Common-patterns.html)
-* [Clojure DSL](Clojure-DSL.html)
+* [DSLs and multilang adapters](DSLs-and-multilang-adapters.html)
 * [Using non-JVM languages with Storm](Using-non-JVM-languages-with-Storm.html)
 * [Distributed RPC](Distributed-RPC.html)
 * [Transactional topologies](Transactional-topologies.html)
@@ -95,16 +95,18 @@ But small change will not affect the user experience. We will notify the
user wh
 * [Apache Hive Integration](storm-hive.html)
 * [Apache Solr Integration](storm-solr.html)
 * [Apache Cassandra Integration](storm-cassandra.html)
+* [Apache RocketMQ Integration](storm-rocketmq.html)
 * [JDBC Integration](storm-jdbc.html)
 * [JMS Integration](storm-jms.html)
+* [MQTT Integration](storm-mqtt.html)
 * [Redis Integration](storm-redis.html)
 * [Event Hubs Intergration](storm-eventhubs.html)
 * [Elasticsearch Integration](storm-elasticsearch.html)
-* [MQTT Integration](storm-mqtt.html)
 * [Mongodb Integration](storm-mongodb.html)
 * [OpenTSDB Integration](storm-opentsdb.html)
 * [Kinesis Integration](storm-kinesis.html)
 * [Druid Integration](storm-druid.html)
+* [PMML Integration](storm-pmml.html)
 * [Kestrel Integration](Kestrel-and-Storm.html)
 
 #### Container, Resource Management System Integration

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/storm-pmml.md
----------------------------------------------------------------------
diff --git a/docs/storm-pmml.md b/docs/storm-pmml.md
new file mode 100644
index 0000000..ad2d5b8
--- /dev/null
+++ b/docs/storm-pmml.md
@@ -0,0 +1,37 @@
+#Storm PMML Bolt
+ Storm integration to load PMML models and compute predictive scores for running tuples.
The PMML model represents
+ the machine learning (predictive) model used to do prediction on raw input data. The model
is typically loaded into a 
+ runtime environment, which will score the raw data that comes in the tuples. 
+
+#Create Instance of PMML Bolt
+ To create an instance of the `PMMLPredictorBolt`, you must provide the `ModelOutputs`, and
a `ModelRunner` using a 
+ `ModelRunnerFactory`. The `ModelOutputs` represents the streams and output fields declared
by the `PMMLPredictorBolt`.
+ The `ModelRunner` represents the runtime environment to execute the predictive scoring.
It has only one method: 
+ 
+ ```java
+    Map<String, List<Object>> scoredTuplePerStream(Tuple input); 
+ ```
+ 
+ This method contains the logic to compute the scored tuples from the raw inputs tuple. 
It's up to the discretion of the 
+ implementation to define which scored values are to be assigned to each stream. The keys
of this map are the stream ids, 
+ and the values the predicted scores. 
+   
+ The `PmmlModelRunner` is an extension of `ModelRunner` that represents the typical steps
involved 
+ in predictive scoring. Hence, it allows for the **extraction** of raw inputs from the tuple,
**pre process** the 
+ raw inputs, and **predict** the scores from the preprocessed data.
+ 
+ The `JPmmlModelRunner` is an implementation of `PmmlModelRunner` that uses [JPMML](https://github.com/jpmml/jpmml)
as
+ runtime environment. This implementation extracts the raw inputs from the tuple for all
`active fields`, 
+ and builds a tuple with the predicted scores for the `predicted fields` and `output fields`.

+ In this implementation all the declared streams will have the same scored tuple.
+ 
+ The `predicted`, `active`, and `output` fields are extracted from the PMML model.
+
+#Run Bundled Examples
+
+To run the examples you must execute the following command:
+ 
+ ```java
+ STORM-HOME/bin/storm jar STORM-HOME/examples/storm-pmml-examples/storm-pmml-examples-2.0.0-SNAPSHOT.jar

+ org.apache.storm.pmml.JpmmlRunnerTestTopology jpmmlTopology PMMLModel.xml RawInputData.csv
+ ```

http://git-wip-us.apache.org/repos/asf/storm/blob/e80f9e20/docs/storm-rocketmq.md
----------------------------------------------------------------------
diff --git a/docs/storm-rocketmq.md b/docs/storm-rocketmq.md
new file mode 100644
index 0000000..17daf8c
--- /dev/null
+++ b/docs/storm-rocketmq.md
@@ -0,0 +1,94 @@
+# Storm RocketMQ
+
+Storm/Trident integration for [RocketMQ](https://rocketmq.incubator.apache.org/). This package
includes the core spout, bolt and trident states that allows a storm topology to either write
storm tuples into a topic or read from topics in a storm topology.
+
+
+## Read from Topic
+The spout included in this package for reading data from a topic.
+
+### RocketMQSpout
+To use the `RocketMQSpout`,  you construct an instance of it by specifying a Properties instance
which including rocketmq configs.
+RocketMQSpout uses RocketMQ MQPushConsumer as the default implementation. PushConsumer is
a high level consumer API, wrapping the pulling details. Looks like broker push messages to
consumer.
+RocketMQSpout will retry 3(use `SpoutConfig.DEFAULT_MESSAGES_MAX_RETRY` to change the value)
times when messages are failed.
+
+ ```java
+        Properties properties = new Properties();
+        properties.setProperty(SpoutConfig.NAME_SERVER_ADDR, nameserverAddr);
+        properties.setProperty(SpoutConfig.CONSUMER_GROUP, group);
+        properties.setProperty(SpoutConfig.CONSUMER_TOPIC, topic);
+
+        RocketMQSpout spout = new RocketMQSpout(properties);
+ ```
+
+
+## Write into Topic
+The bolt and trident state included in this package for write data into a topic.
+
+### TupleToMessageMapper
+The main API for mapping Storm tuple to a RocketMQ Message is the `org.apache.storm.rocketmq.common.mapper.TupleToMessageMapper`
interface:
+
+```java
+public interface TupleToMessageMapper extends Serializable {
+    String getKeyFromTuple(ITuple tuple);
+    byte[] getValueFromTuple(ITuple tuple);
+}
+```
+
+### FieldNameBasedTupleToMessageMapper
+`storm-rocketmq` includes a general purpose `TupleToMessageMapper` implementation called
`FieldNameBasedTupleToMessageMapper`.
+
+### TopicSelector
+The main API for selecting topic and tags is the `org.apache.storm.rocketmq.common.selector.TopicSelector`
interface:
+
+```java
+public interface TopicSelector extends Serializable {
+    String getTopic(ITuple tuple);
+    String getTag(ITuple tuple);
+}
+```
+
+### DefaultTopicSelector/FieldNameBasedTopicSelector
+`storm-rocketmq` includes general purpose `TopicSelector` implementations called `DefaultTopicSelector`
and `FieldNameBasedTopicSelector`.
+
+
+### RocketMQBolt
+To use the `RocketMQBolt`, you construct an instance of it by specifying TupleToMessageMapper,
TopicSelector and Properties instances.
+RocketMQBolt send messages async by default. You can change this by invoking `withAsync(false)`.
+
+ ```java
+        TupleToMessageMapper mapper = new FieldNameBasedTupleToMessageMapper("word", "count");
+        TopicSelector selector = new DefaultTopicSelector(topic);
+
+        properties = new Properties();
+        properties.setProperty(RocketMQConfig.NAME_SERVER_ADDR, nameserverAddr);
+
+        RocketMQBolt insertBolt = new RocketMQBolt()
+                .withMapper(mapper)
+                .withSelector(selector)
+                .withProperties(properties);
+ ```
+
+### Trident State
+We support trident persistent state that can be used with trident topologies. To create a
RocketMQ persistent trident state you need to initialize it with the TupleToMessageMapper,
TopicSelector, Properties instances. See the example below:
+
+ ```java
+        TupleToMessageMapper mapper = new FieldNameBasedTupleToMessageMapper("word", "count");
+        TopicSelector selector = new DefaultTopicSelector(topic);
+
+        Properties properties = new Properties();
+        properties.setProperty(RocketMQConfig.NAME_SERVER_ADDR, nameserverAddr);
+
+        RocketMQState.Options options = new RocketMQState.Options()
+                .withMapper(mapper)
+                .withSelector(selector)
+                .withProperties(properties);
+
+        StateFactory factory = new RocketMQStateFactory(options);
+
+        TridentTopology topology = new TridentTopology();
+        Stream stream = topology.newStream("spout1", spout);
+
+        stream.partitionPersist(factory, fields,
+                new RocketMQStateUpdater(), new Fields());
+ ```
+


Mime
View raw message