spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Palash Gupta <spline_pal...@yahoo.com.INVALID>
Subject Re: Spark #cores
Date Thu, 19 Jan 2017 03:23:37 GMT
Hi,
I think I faced the same problem for Spark 2.1.0 when I tried to define number of executors
from SparkConf ot SparkSession builder in a standalone cluster. Always it is taking all available
core.
There are three ways to do it:
1. Define spark.executor.cores in conf/spark-defaults.conf and then when you run spark-submit
it will read from there. You need to set it for all hosts in cluster.
2. Passing parameter in spark-submit that you have done already
3. Set this parameter in runtime (in your code while initiating sparkSession or sparkConf
object) e.g. in python
# Configure Sparkconf = SparkConf().setAppName(APP_NAME).set("spark.default.parallelism","20").set("spark.cores.max","5").set("spark.executor.cores","2").set("spark.driver.memory","10g").set("spark.executor.memory","5g").set("spark.storage.memoryFraction","0.4").set("spark.local.dir","/tmp/my-app/")sc
= SparkContext(conf=conf)sqlContext = SQLContext(sc)
Even you can set sparkSession as well like this to set this kind of parameter.
You can try option #1 & 2 and verify from UI that how many cores are used as per your
setting.

Moreover as you are running standalone cluster mode, hope you set --master parameter to spark://
url. 

Best Regards,P.Gupta


Sent from Yahoo Mail on Android 
 
  On Wed, 18 Jan, 2017 at 11:33 pm, Saliya Ekanayake<esaliya@gmail.com> wrote:   The
Spark version I am using is 2.10. The language is Scala. This is running in standalone cluster
mode.
Each worker is able to use all physical CPU cores in the cluster as is the default case.
I was using the following parameters to spark-submit
--conf spark.executor.cores=1 --conf spark.default.parallelism=32

Later, I read that the term "cores" doesn't mean physical CPU cores but rather #tasks that
an executor can execute. 
Anyway, I don't have a clear idea how to set the number of executors per physical node. I
see there's an option in the Yarn mode, but it's not available for standalone cluster mode.
Thank you,Saliya
On Wed, Jan 18, 2017 at 12:13 PM, Palash Gupta <spline_palash@yahoo.com> wrote:

Hi,
Can you please share how you are assigning cpu core & tell us spark version and language
you are using?
//Palash

Sent from Yahoo Mail on Android 
 
 On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake<esaliya@gmail.com> wrote:   Thank
you, for the quick response. No, this is not Spark SQL. I am running the built-in PageRank.
On Wed, Jan 18, 2017 at 10:33 AM, <jasbir.sing@accenture.com> wrote:


Are you talking here of Spark SQL ?

If yes,spark.sql.shuffle.partitions needs to be changed.

 

From: Saliya Ekanayake [mailto:esaliya@gmail.com]
Sent: Wednesday, January 18, 2017 8:56 PM
To: User <user@spark.apache.org>
Subject: Spark #cores

 

Hi,

 

I am running a Spark application setting the number of executor cores 1 and a default parallelism
of 32 over 8 physical nodes. 

 

The web UI shows it's running on 200 cores. I can't relate this number to the parameters I've
used. How can I control the parallelism in a more deterministic way?

 

Thank you,

Saliya

 

-- 

Saliya Ekanayake, Ph.D

Applied Computer Scientist

Network Dynamics and Simulation Science Laboratory (NDSSL)

Virginia Tech, Blacksburg

 


This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise confidential information. If you have received it in error, please notify the
sender immediately and delete the original. Any other use of the e-mail by you is prohibited.
Where allowed by local law, electronic communications with Accenture and its affiliates, including
e-mail and instant messaging (including content), may be scanned by our systems for the purposes
of information security and assessment of internal compliance with Accenture policy. 
______________________________ ______________________________ __________________________

www.accenture.com




-- 
Saliya Ekanayake, Ph.DApplied Computer ScientistNetwork Dynamics and Simulation Science Laboratory
(NDSSL)Virginia Tech, Blacksburg

  




-- 
Saliya Ekanayake, Ph.DApplied Computer ScientistNetwork Dynamics and Simulation Science Laboratory
(NDSSL)Virginia Tech, Blacksburg

  

Mime
View raw message