spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shahab <shahab.mok...@gmail.com>
Subject Which is more efficient : first join three RDDs and then do filtering or vice versa?
Date Thu, 12 Mar 2015 15:04:36 GMT
Hi,

Probably this question is already answered sometime in the mailing list,
but i couldn't find it. Sorry for posting this again.

I need to to join and apply filtering on three different RDDs, I just
wonder which of the following alternatives are more efficient:
1- first joint all three RDDs and then do  filtering on resulting joint RDD
  or
2- Apply filtering on each individual RDD and then join the resulting RDDs


Or probably there is no difference due to lazy evaluation and under beneath
Spark optimisation?

best,
/Shahab

Mime
View raw message