spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <>
Subject Re: How to gracefully handle Kafka OffsetOutOfRangeException
Date Fri, 18 Mar 2016 14:40:22 GMT
Is that happening only at startup, or during processing?  If that's
happening during normal operation of the stream, you don't have enough
resources to process the stream in time.

There's not a clean way to deal with that situation, because it's a
violation of preconditions.  If you want to modify the code to do what
makes sense for you, start looking at handleFetchErr in KafkaRDD.scala
  Recompiling that package isn't a big deal, because it's not a part
of the core spark deployment, so you'll only have to change your job,
not the deployed version of spark.

On Fri, Mar 18, 2016 at 6:16 AM, Ramkumar Venkataraman
<> wrote:
> I am using Spark streaming and reading data from Kafka using
> KafkaUtils.createDirectStream. I have the "auto.offset.reset" set to
> smallest.
> But in some Kafka partitions, I get kafka.common.OffsetOutOfRangeException
> and my spark job crashes.
> I want to understand if there is a graceful way to handle this failure and
> not kill the job. I want to keep ignoring these exceptions, as some other
> partitions are fine and I am okay with data loss.
> Is there any way to handle this and not have my spark job crash? I have no
> option of increasing the kafka retention period.
> I tried to have the DStream returned by createDirectStream() wrapped in a
> Try construct, but since the exception happens in the executor, the Try
> construct didn't take effect. Do you have any ideas of how to handle this?
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message