spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghavendra Pandey <raghavendra.pan...@gmail.com>
Subject Re: Spark Streaming: Async action scheduling inside foreachRDD
Date Fri, 04 Aug 2017 13:54:32 GMT
Did you try SparkContext.addSparkListener?



On Aug 3, 2017 1:54 AM, "Andrii Biletskyi"
<andrii.biletskyi@yahoo.com.invalid> wrote:

> Hi all,
>
> What is the correct way to schedule multiple jobs inside foreachRDD method
> and importantly await on result to ensure those jobs have completed
> successfully?
> E.g.:
>
> kafkaDStream.foreachRDD{ rdd =>
> val rdd1 = rdd.map(...)
> val rdd2 = rdd1.map(...)
>
> val job1Future = Future{
> rdd1.saveToCassandra(...)
> }
>
> val job2Future = Future{
> rdd1.foreachPartition( iter => /* save to Kafka */)
> }
>
>       Await.result(
>       Future.sequence(job1Future, job2Future),
>       Duration.Inf)
>
>
>    // commit Kafka offsets
> }
>
> In this code I'm scheduling two actions in futures and awaiting them. I
> need to be sure when I commit Kafka offsets at the end of the batch
> processing that job1 and job2 have actually executed successfully. Does
> given approach provide these guarantees? I.e. in case one of the jobs fails
> the entire batch will be marked as failed too?
>
>
> Thanks,
> Andrii
>

Mime
View raw message