spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrii Biletskyi <andrii.bilets...@yahoo.com.INVALID>
Subject Spark Streaming: Async action scheduling inside foreachRDD
Date Wed, 02 Aug 2017 20:24:17 GMT
Hi all,

What is the correct way to schedule multiple jobs inside foreachRDD method
and importantly await on result to ensure those jobs have completed
successfully?
E.g.:

kafkaDStream.foreachRDD{ rdd =>
val rdd1 = rdd.map(...)
val rdd2 = rdd1.map(...)

val job1Future = Future{
rdd1.saveToCassandra(...)
}

val job2Future = Future{
rdd1.foreachPartition( iter => /* save to Kafka */)
}

      Await.result(
      Future.sequence(job1Future, job2Future),
      Duration.Inf)


   // commit Kafka offsets
}

In this code I'm scheduling two actions in futures and awaiting them. I
need to be sure when I commit Kafka offsets at the end of the batch
processing that job1 and job2 have actually executed successfully. Does
given approach provide these guarantees? I.e. in case one of the jobs fails
the entire batch will be marked as failed too?


Thanks,
Andrii

Mime
View raw message