storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Douglas Alan <d...@alum.mit.edu>
Subject Re: Do I need Kafka to have a reliable Storm spout?
Date Thu, 26 Feb 2015 16:12:45 GMT
Thank you, Forian, for your very clear and extremely clarifying response!

|>oug


> From: Florian Hussonnois <fhussonnois@gmail.com>
> To: user@storm.apache.org
> Cc:
> Date: Wed, 25 Feb 2015 14:41:17 +0100
> Subject: Re: Do I need Kafka to have a reliable Storm spout?
> Hi,
> Actually, the tuples aren't persisted into "zookeeper". If your "spout"
> emits a tuple with a unique id, it will be automatically follow internally
> by storm (i.e ackers) . Thus, in case the emitted tuple comes to fail
> because of a bolt failure, Storm invokes the method 'fail' on the origin
> spout task with the unique id as argument.
>


> It's then up to you to re-emit the failed tuple.
>


> In sample codes, spouts use a Map to track which tuples are fully
> processed by your entire topology in order to be able to re-emit in case of
> a bolt failure.
>


> However, if the failure doesn't come from a bolt but from your spout, the
> in memory Map will be lost and your topology will not be able to remit
> failed tuples.
>


> For a such scenario you can rely on Kafka. In fact, the Kafka Spout store
> its read offset into zookeeper. In that way, if a spout task goes down it
> will be able to read  its offset from zookeeper after restarting.
> Hope this help you.
>


> 2015-02-24 18:58 GMT+01:00 Douglas Alan <doug@alum.mit.edu>:
>
> As I understand things, ZooKeeper will persist tuples emitted by bolts so
> if a bolt crashes (or a computer with the bolt crashes, or the entire
> cluster crashes), the tuple emitted by the bolt will not be lost. Once
> everything is restarted, the tuples will be fetched from ZooKeeper, and
> everything will continue on as if nothing bad ever happened.
>
> What I don't yet understand is if the same thing is true for spouts. If a
> spout emits a tuple (i.e., the emit() function within a spout is executed),
> and the computer the spout is running on crashes shortly thereafter, will
> that tuple be resurrected by ZooKeeper? Or do we need Kafka in order to
> guarantee this?
>
> |>oug
>
> P.S. I understand that the tuple emitted by the spout must be assigned a
> unique ID in the call to emit().
>
> P.P.S. I see sample code in books that uses something like
> ConcurrentHashMap<UUID, Values> to track which spouted tuples have not yet
> been acked. Is this somehow automatically persisted with ZooKeeper? I
> suspect not and if not, then I shouldn't really be doing that, should I?
> What should I being doing instead? Using Kafka?
>
>

Mime
View raw message