storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@apache.org>
Subject Re: How to let a topology know that it's time to stop?
Date Sat, 07 May 2016 15:57:11 GMT
As you mentioned already: Storm is designed to run topologies forever ;)
If you have finite data, why do you not use a batch processor???

As a workaround, you can embed "control messages" in your stream (or use
an additional stream for them).

If you want a topology to shut down itself, you could use
`NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
in your spout/bolt code.

Something like:
 - Spout emit all tuples
 - Spout emit special "end of stream" control tuple
 - Bolt1 processes everything
 - Bolt1 forward "end of stream" control tuple (when it received it)
 - Bolt2 processed everything
 - Bolt2 receives "end of stream" control tuple => flush to DB => kill
topology

But I guess, this is kinda weird pattern.

-Matthias

On 05/05/2016 06:13 AM, Navin Ipe wrote:
> Hi,
> 
> I know Storm is designed to run forever. I also know about Trident's
> technique of aggregation. But shouldn't Storm have a way to let bolts
> know that a certain bunch of processing has been completed?
> 
> Consider this topology:
> Spout------>Bolt-A------>Bolt-B
>             |                  |--->Bolt-B
>             |                  \--->Bolt-B
>             |--->Bolt-A------>Bolt-B
>             |                  |--->Bolt-B
>             |                  \--->Bolt-B
>             \--->Bolt-A------>Bolt-B
>                                |--->Bolt-B
>                                \--->Bolt-B
> 
>   * From Bolt-A to Bolt-B, it is a FieldsGrouping.
>   * Spout emits only a few tuples and then stops emitting.
>   * Bolt A takes those tuples and generates millions of tuples.
> 
> 
> *Bolt-B accumulates tuples that Bolt A sends and needs to know when
> Spout finished emitting. Only then can Bolt-B start writing to SQL.*
> 
> *Questions:*
> 1. How can all Bolts B be notified that it is time to write to SQL?
> 2. After all Bolts B have written to SQL, how to know that all Bolts B
> have completed writing?
> 3. How to stop the topology? I know of
> localCluster.killTopology("HelloStorm"), but shouldn't there be a way to
> do it from the Bolt?
> 
> -- 
> Regards,
> Navin


Mime
View raw message