spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sonal Goyal <sonalgoy...@gmail.com>
Subject Re: Is RDD thread safe?
Date Tue, 19 Nov 2019 13:46:38 GMT
the RDD or the dataframe is distributed and partitioned by Spark so as to
leverage all your workers (CPUs) effectively. So all the Dataframe
operations are actually happening simultaneously on a section of the data.
Why do you want to use threading here?

Thanks,
Sonal
Nube Technologies <http://www.nubetech.co>

<http://in.linkedin.com/in/sonalgoyal>




On Tue, Nov 12, 2019 at 7:18 AM Chang Chen <baibaichen@gmail.com> wrote:

>
> Hi all
>
> I meet a case where I need cache a source RDD, and then create different
> DataFrame from it in different threads to accelerate query.
>
> I know that SparkSession is thread safe(
> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
> whether RDD  si thread safe or not
>
> Thanks
> Chang
>

Mime
View raw message