spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian <psilonl...@gmail.com>
Subject Re: How to set the degree of parallelism in Spark SQL?
Date Thu, 26 May 2016 17:45:02 GMT
The number of executors is set when you launch the shell or an application
with /spark-submit/. It's controlled by the /num-executors/ parameter:
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/.

Important is also that cranking up the number may not cause your queries to
run faster. If you set it to, let's say 200, but you only have 10 cores
divided over 5 nodes, then you may not see a significant speed-up beyond
5-10 executors.

You may want to check out Cloudera's tuning guide:
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-the-degree-of-parallelism-in-Spark-SQL-tp26996p27031.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message