spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Lost leader exception in Kafka Direct for Streaming
Date Thu, 01 Oct 2015 14:18:14 GMT
Did you check you kafka broker logs to see what was going on during that
time?

The direct stream will handle normal leader loss / rebalance by retrying
tasks.

But the exception you got indicates that something with kafka was wrong,
such that offsets were being re-used.

ie. your job already processed up through beginning offset 15027734702

but when asking kafka for the highest available offsets, it returns ending
offset 15027725493

which is lower, in other words kafka lost messages.  This might happen
because you lost a leader and recovered from a replica that wasn't in sync,
or someone manually screwed up a topic, or ... ?

If you really want to just blindly "recover" from this situation (even
though something is probably wrong with your data), the most
straightforward thing to do is monitor and restart your job.




On Wed, Sep 30, 2015 at 4:31 PM, swetha <swethakasireddy@gmail.com> wrote:

>
> Hi,
>
> I see this sometimes in our Kafka Direct approach in our Streaming job. How
> do we make sure that the job recovers from such errors and works normally
> thereafter?
>
> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition
> 19,  sleeping for 200ms
> 15/09/30 05:14:18 ERROR KafkaRDD: Lost leader for topic x_stream partition
> 5,  sleeping for 200ms
>
> Followed by every task failing with something like this:
>
> 15/09/30 05:26:20 ERROR Executor: Exception in task 4.0 in stage 84281.0
> (TID 818804)
> kafka.common.NotLeaderForPartitionException
>
> And:
>
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 15
> in stage 84958.0 failed 4 times, most recent failure: Lost task 15.3 in
> stage 84958.0 (TID 819461, 10.227.68.102): java.lang.AssertionError:
> assertion failed: Beginning offset 15027734702 is after the ending offset
> 15027725493 for topic hubble_stream partition 12. You either provided an
> invalid fromOffset, or the Kafka topic has been damaged
>
>
> Thanks,
> Swetha
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Lost-leader-exception-in-Kafka-Direct-for-Streaming-tp24891.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message