spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From innowireless TaeYun Kim <taeyun....@innowireless.co.kr>
Subject The number of cores vs. the number of executors
Date Tue, 08 Jul 2014 00:42:46 GMT
Hi,

 

I'm trying to understand the relationship of the number of cores and the
number of executors when running a Spark job on YARN.

 

The test environment is as follows:

 

- # of data nodes: 3

- Data node machine spec:

- CPU: Core i7-4790 (# of cores: 4, # of threads: 8)

- RAM: 32GB (8GB x 4)

- HDD: 8TB (2TB x 4)

- Network: 1Gb

 

- Spark job flow: sc.textFile -> filter -> map -> filter -> mapToPair ->
reduceByKey -> map -> saveAsTextFile

 

- input data

- type: single text file

- size: 165GB

- # of lines: 454,568,833

 

- output

- # of lines after second filter: 310,640,717

- # of lines of the result file: 99,848,268

- size of the result file: 41GB

 

The job was run with following configurations:

 

1) --master yarn-client --executor-memory 19G --executor-cores 7
--num-executors 3  (executors per data node, use as much as cores)

2) --master yarn-client --executor-memory 19G --executor-cores 4
--num-executors 3  (# of cores reduced)

3) --master yarn-client --executor-memory 4G --executor-cores 2
--num-executors 12  (less core, more executor)

 

- elapsed times:

1) 50 min 15 sec

2) 55 min 48 sec

3) 31 min 23 sec

 

To my surprise, 3) was much faster.

I thought that 1) would be faster, since there would be less inter-executor
communication when shuffling.

Although # of cores of 1) is fewer than 3), #of cores is not the key factor
since 2) did perform well.

 

How can I understand this result?

 

Thanks.

 

 

 

 


Mime
View raw message