spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evo Eftimov" <>
Subject RE: How to increase the number of tasks
Date Fri, 05 Jun 2015 10:10:12 GMT
The param is for “Default number of partitions in RDDs returned by transformations like join,
reduceByKey, and parallelize when NOT set by user.”


While Deepak is setting the number of partitions EXPLICITLY 


From: 李铖 [] 
Sent: Friday, June 5, 2015 11:08 AM
To: ÐΞ€ρ@Ҝ (๏̯͡๏)
Cc: Evo Eftimov; user
Subject: Re: How to increase the number of tasks


just multiply 2-4 with the cpu core number of the node .


2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <>:

I did not change spark.default.parallelism,

What is recommended value for it. 


On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <> wrote:

Did you have a change of the value of 'spark.default.parallelism'?be a bigger number.


2015-06-05 17:56 GMT+08:00 Evo Eftimov <>:

It may be that your system runs out of resources (ie 174 is the ceiling) due to the following


1.       RDD Partition = (Spark) Task

2.       RDD Partition != (Spark) Executor

3.       (Spark) Task != (Spark) Executor

4.       (Spark) Task = JVM Thread

5.       (Spark) Executor = JVM instance 


From: ÐΞ€ρ@Ҝ (๏̯͡๏) [] 
Sent: Friday, June 5, 2015 10:48 AM
To: user
Subject: How to increase the number of tasks


I have a  stage that spawns 174 tasks when i run repartition on avro data. 

Tasks read between 512/317/316/214/173  MB of data. Even if i increase number of executors/
number of partitions (when calling repartition) the number of tasks launched remains fixed
to 174.


1) I want to speed up this task. How do i do it ?

2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this behavior ?

Since this is a repartition stage, it should not depend on the nature of data.


Its taking more than 30 mins and i want to speed it up by throwing more executors at it.


Please suggest










View raw message