spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shahab <>
Subject Which is more efficient : first join three RDDs and then do filtering or vice versa?
Date Thu, 12 Mar 2015 15:04:36 GMT

Probably this question is already answered sometime in the mailing list,
but i couldn't find it. Sorry for posting this again.

I need to to join and apply filtering on three different RDDs, I just
wonder which of the following alternatives are more efficient:
1- first joint all three RDDs and then do  filtering on resulting joint RDD
2- Apply filtering on each individual RDD and then join the resulting RDDs

Or probably there is no difference due to lazy evaluation and under beneath
Spark optimisation?


View raw message