storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florian Hussonnois <fhussonn...@gmail.com>
Subject Re: Do I need Kafka to have a reliable Storm spout?
Date Wed, 25 Feb 2015 13:41:17 GMT
Hi,

Actually, the tuples aren't persisted into "zookeeper". If your "spout"
emits a tuple with a unique id, it will be automatically follow internally
by storm (i.e ackers) . Thus, in case the emitted tuple comes to fail
because of a bolt failure, Storm invokes the method 'fail' on the origin
spout task with the unique id as argument.

It's then up to you to re-emit the failed tuple.

In sample codes, spouts use a Map to track which tuples are fully processed
by your entire topology in order to be able to re-emit in case of a bolt
failure.

However, if the failure doesn't come from a bolt but from your spout, the
in memory Map will be lost and your topology will not be able to remit
failed tuples.

For a such scenario you can rely on Kafka. In fact, the Kafka Spout store
its read offset into zookeeper. In that way, if a spout task goes down it
will be able to read  its offset from zookeeper after restarting.

Hope this help you.

2015-02-24 18:58 GMT+01:00 Douglas Alan <doug@alum.mit.edu>:

> As I understand things, ZooKeeper will persist tuples emitted by bolts so
> if a bolt crashes (or a computer with the bolt crashes, or the entire
> cluster crashes), the tuple emitted by the bolt will not be lost. Once
> everything is restarted, the tuples will be fetched from ZooKeeper, and
> everything will continue on as if nothing bad ever happened.
>
> What I don't yet understand is if the same thing is true for spouts. If a
> spout emits a tuple (i.e., the emit() function within a spout is executed),
> and the computer the spout is running on crashes shortly thereafter, will
> that tuple be resurrected by ZooKeeper? Or do we need Kafka in order to
> guarantee this?
>
> |>oug
>
> P.S. I understand that the tuple emitted by the spout must be assigned a
> unique ID in the call to emit().
>
> P.P.S. I see sample code in books that uses something like
> ConcurrentHashMap<UUID, Values> to track which spouted tuples have not yet
> been acked. Is this somehow automatically persisted with ZooKeeper? I
> suspect not and if not, then I shouldn't really be doing that, should I?
> What should I being doing instead? Using Kafka?
>



-- 
Florian HUSSONNOIS
Tel +33 6 26 92 82 23

Mime
View raw message