spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gylfi <gy...@berkeley.edu>
Subject Re: Spark same execution time on 1 node and 5 nodes
Date Sat, 18 Jul 2015 07:56:54 GMT
Hi. 

If I just look at the two pics, I see that there is only one sub-task that
takes all the time.. 
This is the flatmapToPair at Coef...  line 52.
I also see that there are only two partitions that make up the input and
thus probably only two workers active. 

Try repartitioning the data into more parts before line 52 by calling
"rddname".repartition(10) for example and see if it runs faster.. 

Regards, 
    Gylfi. 



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-same-execution-time-on-1-node-and-5-nodes-tp23866p23893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message