spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Spark streaming -> cassandra : Fault Tolerance
Date Wed, 09 Sep 2015 19:43:28 GMT
It's been a while since I've looked at the cassandra connector, so I can't
give you specific advice on it.

But in general, if a spark task fails (uncaught exception), it will be
retried automatically.  In the case of the kafka direct stream rdd, it will
have exactly the same messages as the first attempt (as long as they're
still in the kafka log).

If you or the cassandra connector are catching the exception, the task
won't be retried automatically and it's up to you to deal with it.



On Wed, Sep 9, 2015 at 2:09 PM, Samya <samya.maiti@amadeus.com> wrote:

> Hi Team,
>
> I have an sample spark application which reads from Kafka using direct API
> &
> then does some transformation & stores to cassandra (using
> saveToCassandra(....)).
>
> If Cassandra goes down, then application logs NoHostAvailable exception (as
> expected). But in the mean time the new incoming messages are lost, as the
> Direct API creates new checkpoint & deletes the previous one's.
>
> Does that mean, I should handle the exception at application side?
>
> Or is there any other hook to handle the same?
>
> Thanks in advance.
>
> Regards,
> Sam
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-cassandra-Fault-Tolerance-tp24625.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message