spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartheek.R" <kartheek.m...@gmail.com>
Subject Inconsistent execution times for same application.
Date Sun, 15 Feb 2015 15:15:55 GMT
Hi,
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
different execution times each time I run the same application with same
data (1G text data). Sometimes the difference is as huge as 10-15 mins. I
think, this pertains to scheduling when the cluster is heterogeneous in
nature. Can someone please tell me how tackle this issue?. I need to get
consistent results. Any suggestions please!!

I cache() the rdd. Total 7 slave nodes. Executor memory=2500m.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Inconsistent-execution-times-for-same-application-tp21662.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message