Hi Anny, SPARK_WORKER_INSTANCES is the number of copies of spark workers running on a single box.  If you change the number you change how the hardware you have is split up (useful for breaking large servers into <32GB heaps each which perform better) but doesn't change the amount of hardware you have.  Because the hardware's the same, you're not going to see huge performance improvements unless you were in the huge heap scenario.

Typically you should configure the parameters so that SPARK_WORKER_CORES * SPARK_WORKER_INSTANCES = the number of cores on your machine.  If you have an 8 core box, then you should lower SPARK_WORKER_CORES as you raise SPARK_WORKER_INSTANCES.


On Mon, Oct 20, 2014 at 3:21 PM, anny9699 <anny9699@gmail.com> wrote:

I have a question about the worker_instances setting and worker_cores
setting in aws ec2 cluster. I understand it is a cluster and the default
setting in the cluster is


However after I changed it to


Seems the speed doesn't change very much. Could anyone give an explanation
about this? Maybe more details about work_cores vs worker_instances?

Thanks a lot!

View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/worker-instances-vs-worker-cores-tp16855.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org