I did not change spark.default.parallelism,
What is recommended value for it. 

On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidaling1@gmail.com> wrote:
Did you have a change of the value of 'spark.default.parallelism'?be a bigger number.

2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.eftimov@isecc.com>:

It may be that your system runs out of resources (ie 174 is the ceiling) due to the following

 

1.       RDD Partition = (Spark) Task

2.       RDD Partition != (Spark) Executor

3.       (Spark) Task != (Spark) Executor

4.       (Spark) Task = JVM Thread

5.       (Spark) Executor = JVM instance

 

From: ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepujain@gmail.com]
Sent: Friday, June 5, 2015 10:48 AM
To: user
Subject: How to increase the number of tasks

 

I have a  stage that spawns 174 tasks when i run repartition on avro data. 

Tasks read between 512/317/316/214/173  MB of data. Even if i increase number of executors/ number of partitions (when calling repartition) the number of tasks launched remains fixed to 174.

 

1) I want to speed up this task. How do i do it ?

2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is this behavior ?

Since this is a repartition stage, it should not depend on the nature of data.

 

Its taking more than 30 mins and i want to speed it up by throwing more executors at it.

 

Please suggest

 

Deepak

 





--
Deepak