spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kartheek.R" <>
Subject Inconsistent execution times for same application.
Date Sun, 15 Feb 2015 15:15:55 GMT
My spark cluster contains machines like Pentium-4, dual core and quad-core
machines. I am trying to run a character frequency count application. The
application contains several threads, each submitting a job(action) that
counts the frequency of a single character. But, my problem is, I get
different execution times each time I run the same application with same
data (1G text data). Sometimes the difference is as huge as 10-15 mins. I
think, this pertains to scheduling when the cluster is heterogeneous in
nature. Can someone please tell me how tackle this issue?. I need to get
consistent results. Any suggestions please!!

I cache() the rdd. Total 7 slave nodes. Executor memory=2500m.

View this message in context:
Sent from the Apache Spark User List mailing list archive at
View raw message