spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shuai Zheng" <>
Subject RE: Executor parameter doesn't work for Spark-shell on EMR Yarn
Date Thu, 15 Jan 2015 21:03:17 GMT
Forget to mention, I use EMR AMI 3.3.1, Spark 1.2.0. Yarn 2.4. The spark is
setup by the standard script:



From: Shuai Zheng [] 
Sent: Thursday, January 15, 2015 3:52 PM
Subject: Executor parameter doesn't work for Spark-shell on EMR Yarn


Hi All,


I am testing Spark on EMR cluster. Env is a one node cluster r3.8xlarge. Has
32 vCore and 244G memory.


But the command line I use to start up spark-shell, it can't work. For


~/spark/bin/spark-shell --jars
/home/hadoop/vrisc-lib/aws-java-sdk-1.9.14/lib/*.jar --num-executors 6
--executor-memory 10G


Neither num-executors nor memory setup works.


And more interesting, if I use test code:

val lines = sc.parallelize(List("-240990|161327,9051480,0,2,30.48,75",

var count = lines.mapPartitions(dynamoDBBatchWriteFunc).collect.sum


It will start 32 executors (then I assume it try to start all executors for
every vCore).


But if I use some real data to do it (the file size is 200M):

val lines = sc.textFile("s3://.../part-r-00000") 

var count = lines.mapPartitions(dynamoDBBatchWriteFunc).collect.sum

It will only start 4 executors, which map to the number of HDFS split (200M
will have 4 splits).


So I have two questions:

1, Why the setup parameter is ignored by Yarn? How can I limit the number of
executors I can run? 

2, Why my much smaller test data set will trigger 32 executors but my real
200M data set will only have 4 executors?


So how should I control the executor setup on the spark-shell? And I print
the sparkConf, it looks like much less than I expect, and I don't see my
pass in parameter show there.


scala> sc.getConf.getAll.foreach(println)







(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70)

(,Spark shell)









I search the old threads, attached email answer the question about why vCore
setup doesn't work. But I think this is not same issue as me. Otherwise then
default Yarn Spark setup can't do any adjustment? 









View raw message