spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Re: Error recovery strategies using the DirectKafkaInputDStream
Date Thu, 14 May 2015 22:43:18 GMT
I think as long as you have adequate monitoring and Kafka retention, the
simplest solution is safest - let it crash.
 On May 14, 2015 4:00 PM, "badgerpants" <> wrote:

> We've been using the new DirectKafkaInputDStream to implement an exactly
> once
> processing solution that tracks the provided offset ranges within the same
> transaction that persists our data results. When an exception is thrown
> within the processing loop and the configured number of retries are
> exhausted the stream will skip to the end of the failed range of offsets
> and
> continue on with the next  RDD.
> Makes sense but we're wondering how others would handle recovering from
> failures. In our case the cause of the exception was a temporary outage of
> a
> needed service. Since the transaction rolled back at the point of failure
> our offset tracking table retained the correct offsets updated so we simply
> needed to restart the Spark process whereupon it happily picked up at the
> correct point and continued. Short of the restart do people have any good
> ideas for how we might recover?
> FWIW We've looked at setting spark.task.maxFailures param to a large value
> and looked for a property that would increase the wait between attempts.
> This might mitigate the issue when the availability problem is short lived
> but wouldn't completely eliminate the need to restart.
> Any thoughts, ideas welcome.
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message