spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: What is the difference between forEachAsync vs forEachPartitionAsync?
Date Mon, 03 Apr 2017 03:53:05 GMT
wait rdd operations should infact execute in parallel right? so if I call
rdd.forEachAsync that should execute in parallel isn't it? I guess I am a
little confused what the difference really is between forEachAsync vs
forEachPartitionAsync? besides passing in Tuple vs  Iterator of Tuples to
the lambda respectively.

On Sun, Apr 2, 2017 at 8:36 PM, kant kodali <kanth909@gmail.com> wrote:

> Hi all,
>
> What is the difference between forEachAsync vs forEachPartitionAsync? I
> couldn't find any comments from the Javadoc. If I were to guess here is
> what I would say but please correct me if I am wrong.
>
> forEachAsync just iterate through values from all partitions one by one in
> an Async Manner
>
> forEachPartitionAsync: Fan out each partition and run the lambda for each
> partition in parallel across different workers. The lambda here will
> Iterate through values from that partition one by one in Async manner
>
> Is this right? or am I completely wrong?
>
> Thanks!
>

Mime
View raw message