spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Chen <>
Subject Re: Data locality running Spark on Mesos
Date Thu, 08 Jan 2015 20:04:46 GMT
How did you run this benchmark, and is there a open version I can try it

And what is your configurations, like spark.locality.wait, etc?


On Thu, Jan 8, 2015 at 11:44 AM, mvle <> wrote:

> Hi,
> I've noticed running Spark apps on Mesos is significantly slower compared
> to
> stand-alone or Spark on YARN.
> I don't think it should be the case, so I am posting the problem here in
> case someone has some explanation
> or can point me to some configuration options i've missed.
> I'm running the LinearRegression benchmark with a dataset of 48.8GB.
> On a 10-node stand-alone Spark cluster (each node 4-core, 8GB of RAM),
> I can finish the workload in about 5min (I don't remember exactly).
> The data is loaded into HDFS spanning the same 10-node cluster.
> There are 6 worker instances per node.
> However, when running the same workload on the same cluster but now with
> Spark on Mesos (course-grained mode), the execution time is somewhere
> around
> 15min. Actually, I tried with find-grained mode and giving each Mesos node
> 6
> VCPUs (to hopefully get 6 executors like the stand-alone test), I still get
> roughly 15min.
> I've noticed that when Spark is running on Mesos, almost all tasks execute
> with locality NODE_LOCAL (even in Mesos in coarse-grained mode). On
> stand-alone, the locality is mostly PROCESS_LOCAL.
> I think this locality issue might be the reason for the slow down but I
> can't figure out why, especially for coarse-grained mode as the executors
> supposedly do not go away until job completion.
> Any ideas?
> Thanks,
> Mike
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message