spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gylfi <>
Subject Re: Spark same execution time on 1 node and 5 nodes
Date Sat, 18 Jul 2015 07:56:54 GMT

If I just look at the two pics, I see that there is only one sub-task that
takes all the time.. 
This is the flatmapToPair at Coef...  line 52.
I also see that there are only two partitions that make up the input and
thus probably only two workers active. 

Try repartitioning the data into more parts before line 52 by calling
"rddname".repartition(10) for example and see if it runs faster.. 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message