spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <>
Subject Re: No. of Task vs No. of Executors
Date Tue, 14 Jul 2015 14:35:15 GMT

As you can see, Spark has taken data locality into consideration and thus
scheduled all tasks as node local. It is because spark could run task on a
node where data is present, so spark went ahead and scheduled the tasks. It
is actually good for reading. If you really want to fan out processing, you
may do a repartition(n).
Regarding slowness, as you can see another task has completed successfully
in 6 mins in Excutor id 2.So it does not seem that node itself is slow. it
is possible the computation for one node is skewed. you may want to switch
on speculative execution to see if the same task gets completed in other
node faster or not. If yes, then its a node issue, else, ost ikely data

On Tue, Jul 14, 2015 at 11:43 PM, shahid <> wrote:

> hi
> I have a 10 node cluster  i loaded the data onto hdfs, so the no. of
> partitions i get is 9. I am running a spark application , it gets stuck on
> one of tasks, looking at the UI it seems application is not using all nodes
> to do calculations. attached is the screen shot of tasks, it seems tasks
> are
> put on each node more then once. looking at tasks 8 tasks get completed
> under 7-8 minutes and one task takes around 30 minutes so causing the delay
> in results.
> <
> >
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Best Regards,
Ayan Guha

View raw message