spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sathish Kumaran Vairavelu <vsathishkuma...@gmail.com>
Subject Re: Spark Streaming: Async action scheduling inside foreachRDD
Date Fri, 04 Aug 2017 15:25:50 GMT
Forkjoinpool with task support would help in this case. Where u can create
a thread pool with configured number of threads ( make sure u have enough
cores) and submit job I mean actions to the pool
On Fri, Aug 4, 2017 at 8:54 AM Raghavendra Pandey <
raghavendra.pandey@gmail.com> wrote:

> Did you try SparkContext.addSparkListener?
>
>
>
> On Aug 3, 2017 1:54 AM, "Andrii Biletskyi"
> <andrii.biletskyi@yahoo.com.invalid> wrote:
>
>> Hi all,
>>
>> What is the correct way to schedule multiple jobs inside foreachRDD
>> method and importantly await on result to ensure those jobs have completed
>> successfully?
>> E.g.:
>>
>> kafkaDStream.foreachRDD{ rdd =>
>> val rdd1 = rdd.map(...)
>> val rdd2 = rdd1.map(...)
>>
>> val job1Future = Future{
>> rdd1.saveToCassandra(...)
>> }
>>
>> val job2Future = Future{
>> rdd1.foreachPartition( iter => /* save to Kafka */)
>> }
>>
>>       Await.result(
>>       Future.sequence(job1Future, job2Future),
>>       Duration.Inf)
>>
>>
>>    // commit Kafka offsets
>> }
>>
>> In this code I'm scheduling two actions in futures and awaiting them. I
>> need to be sure when I commit Kafka offsets at the end of the batch
>> processing that job1 and job2 have actually executed successfully. Does
>> given approach provide these guarantees? I.e. in case one of the jobs fails
>> the entire batch will be marked as failed too?
>>
>>
>> Thanks,
>> Andrii
>>
>

Mime
View raw message