spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: How to increase the number of tasks
Date Fri, 05 Jun 2015 10:13:02 GMT
I want to speed up this task (attached image). Its a repartition task.

On Fri, Jun 5, 2015 at 3:40 PM, Evo Eftimov <evo.eftimov@isecc.com> wrote:

> The param is for “Default number of partitions in RDDs returned by
> transformations like join, reduceByKey, and parallelize when NOT set by
> user.”
>
>
>
> While Deepak is setting the number of partitions EXPLICITLY
>
>
>
> *From:* 李铖 [mailto:lidaling1@gmail.com]
> *Sent:* Friday, June 5, 2015 11:08 AM
> *To:* ÐΞ€ρ@Ҝ (๏̯͡๏)
> *Cc:* Evo Eftimov; user
> *Subject:* Re: How to increase the number of tasks
>
>
>
> just multiply 2-4 with the cpu core number of the node .
>
>
>
> 2015-06-05 18:04 GMT+08:00 ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>:
>
> I did not change spark.default.parallelism,
>
> What is recommended value for it.
>
>
>
> On Fri, Jun 5, 2015 at 3:31 PM, 李铖 <lidaling1@gmail.com> wrote:
>
> Did you have a change of the value of 'spark.default.parallelism'?be a
> bigger number.
>
>
>
> 2015-06-05 17:56 GMT+08:00 Evo Eftimov <evo.eftimov@isecc.com>:
>
> It may be that your system runs out of resources (ie 174 is the ceiling)
> due to the following
>
>
>
> 1.       RDD Partition = (Spark) Task
>
> 2.       RDD Partition != (Spark) Executor
>
> 3.       (Spark) Task != (Spark) Executor
>
> 4.       (Spark) Task = JVM Thread
>
> 5.       (Spark) Executor = JVM instance
>
>
>
> *From:* ÐΞ€ρ@Ҝ (๏̯͡๏) [mailto:deepujain@gmail.com]
> *Sent:* Friday, June 5, 2015 10:48 AM
> *To:* user
> *Subject:* How to increase the number of tasks
>
>
>
> I have a  stage that spawns 174 tasks when i run repartition on avro data.
>
> Tasks read between 512/317/316/214/173  MB of data. Even if i increase
> number of executors/ number of partitions (when calling repartition) the
> number of tasks launched remains fixed to 174.
>
>
>
> 1) I want to speed up this task. How do i do it ?
>
> 2) Few tasks finish in 20 mins, few in 15 and few in less than 10. Why is
> this behavior ?
>
> Since this is a repartition stage, it should not depend on the nature of
> data.
>
>
>
> Its taking more than 30 mins and i want to speed it up by throwing more
> executors at it.
>
>
>
> Please suggest
>
>
>
> Deepak
>
>
>
>
>
>
>
>
>
> --
>
> Deepak
>
>
>
>
>



-- 
Deepak

Mime
View raw message