spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: foreachRDD execution
Date Fri, 27 Mar 2015 00:01:04 GMT
Yes, that is the correct understanding. There are undocumented parameters
that allow that, but I do not recommend using those :)

TD

On Wed, Mar 25, 2015 at 6:57 AM, Luis Ángel Vicente Sánchez <
langel.groups@gmail.com> wrote:

> I have a simple and probably dumb question about foreachRDD.
>
> We are using spark streaming + cassandra to compute concurrent users every
> 5min. Our batch size is 10secs and our block interval is 2.5secs.
>
> At the end of the world we are using foreachRDD to join the data in the
> RDD with existing data in Cassandra, update the counters and then save it
> back to Cassandra.
>
> To the best of my understanding, in this scenario, spark streaming
> produces one RDD every 10secs and foreachRDD executes them sequentially,
> that is, foreachRDD would never run in parallel.
>
> Am I right?
>
> Regards,
>
> Luis
>
>
>

Mime
View raw message