spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Eden <>
Subject SparkStreaming multiple output operations failure semantics / error propagation
Date Thu, 14 Jul 2016 22:04:37 GMT

I have a Spark 1.6.2 streaming job with multiple output operations (jobs)
doing idempotent changes in different repositories.

The problem is that I want to somehow pass errors from one output operation
to another such that  in the current output operation I only update
previously successful messages. This has to propagate all the way to the
last job which is supposed to only ACK the successfully processed messages
to the input queue, leaving the unsuccessful ones un-ACKED for later

The overall desired behaviour is best effort / fail fast, leaving the
messages which were not successfully processed by all output operations in
the input queue for retrying later.

Is there a pattern for achieving this in SparkStreaming?

If not can SparkStreaming at least guarantee that if the previous
job/output operation in the batch fails, it does not execute the next
jobs/output operations?

Thanks in advance,

View raw message