spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dávid Szakállas <david.szakal...@risingstack.com>
Subject updateStateByKey for window batching
Date Mon, 22 Aug 2016 10:49:41 GMT
Hi!

I’m curious about the fault-tolerance properties of stateful streaming operations. I am
specifically interested about updateStateByKey.
What happens if a node fails during processing? Is the state recoverable?

Our use case is the following: we have messages arriving from a message queue about updating
a resource specified in the message. When such update request arrives, we wait a specific
amount of times and if in that window another update message arrives pointing in the same
resource, we batch these, and update the after the time elapsed since the first in this window
and update the resource. We thought about using updateStateByKey with key as the resource
identifier.

It is important to guarantee exactly once processing for the messages so every update should
happen, and no more than once.

Is it a good way to go?

Cheers,

--
David Szakallas | Software Engineer, RisingStack
Monitoring with Trace: http://trace.risingstack.com <http://trace.risingstack.com/>
http://risingstack.com <http://risingstack.com/> | http://blog.risingstack.com <http://blog.risingstack.com/>
Twitter: @szdavid92






Mime
View raw message