spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: How to gracefully handle Kafka OffsetOutOfRangeException
Date Mon, 21 Mar 2016 14:55:20 GMT
Spark streaming in general will retry a batch N times then move on to
the next one... off the top of my head, I'm not sure why checkpointing
would have an effect on that.

On Mon, Mar 21, 2016 at 3:25 AM, Ramkumar Venkataraman
<ram.the.monk@gmail.com> wrote:
> Thanks Cody for the quick help. Yes, the exception is happening in the
> executors during processing. I will look into cloning the KafkaRDD and
> swallowing the exception.
>
> But, something weird is happening: when I enable checkpointing on the job,
> my job doesn't crash, it happily proceeds with the next batch, even though I
> see tons of exceptions in the executor logs. So the question is: why is it
> that the spark job doesn't crash when checkpointing is enabled?
>
> I have my code pasted here:
> https://gist.github.com/ramkumarvenkat/00f4fc63f750c537defd
>
> I am not too sure if this is an issue with spark engine or with the
> streaming module. Please let me know if you need more logs or you want me to
> raise a github issue/JIRA.
>
> Sorry for digressing on the original thread.
>
> On Fri, Mar 18, 2016 at 8:10 PM, Cody Koeninger <cody@koeninger.org> wrote:
>>
>> Is that happening only at startup, or during processing?  If that's
>> happening during normal operation of the stream, you don't have enough
>> resources to process the stream in time.
>>
>> There's not a clean way to deal with that situation, because it's a
>> violation of preconditions.  If you want to modify the code to do what
>> makes sense for you, start looking at handleFetchErr in KafkaRDD.scala
>>   Recompiling that package isn't a big deal, because it's not a part
>> of the core spark deployment, so you'll only have to change your job,
>> not the deployed version of spark.
>>
>>
>>
>> On Fri, Mar 18, 2016 at 6:16 AM, Ramkumar Venkataraman
>> <ram.the.monk@gmail.com> wrote:
>> > I am using Spark streaming and reading data from Kafka using
>> > KafkaUtils.createDirectStream. I have the "auto.offset.reset" set to
>> > smallest.
>> >
>> > But in some Kafka partitions, I get
>> > kafka.common.OffsetOutOfRangeException
>> > and my spark job crashes.
>> >
>> > I want to understand if there is a graceful way to handle this failure
>> > and
>> > not kill the job. I want to keep ignoring these exceptions, as some
>> > other
>> > partitions are fine and I am okay with data loss.
>> >
>> > Is there any way to handle this and not have my spark job crash? I have
>> > no
>> > option of increasing the kafka retention period.
>> >
>> > I tried to have the DStream returned by createDirectStream() wrapped in
>> > a
>> > Try construct, but since the exception happens in the executor, the Try
>> > construct didn't take effect. Do you have any ideas of how to handle
>> > this?
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-gracefully-handle-Kafka-OffsetOutOfRangeException-tp26534.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> > For additional commands, e-mail: user-help@spark.apache.org
>> >
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message