spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: How to balance task load
Date Thu, 05 Dec 2013 09:54:33 GMT
Hi Hao,

Where tasks go is influenced by where the data they operate on resides.  If
the data is on one executor, it may make more sense to do all the
computation on that node rather than ship data across the network.  How was
the data distributed across your cluster?

Andrew


On Mon, Dec 2, 2013 at 7:52 AM, Hao REN <julien19890118@gmail.com> wrote:

> Sorry for spam.
>
> To complete the my previous post:
>
> The map action sometimes creates 4 tasks which are all executed by the
> same executor.
>
> I believe that if a task dispatch like:
> executor_0 : 1 task;
> executor_1 : 1 task;
> executor_2 : 2 task;
> it will give a better performance.
>
> Can we force this kind of schedule in Spark ?
>
> Thank you.
>
>
>
> 2013/12/2 Hao REN <julien19890118@gmail.com>
>
>> Hi,
>>
>> When running some tests on EC2 with spark, I notice that: the tasks are
>> not fairly distributed to executor.
>>
>> For example, a map action produces 4 tasks, but they all go to the
>>
>>
>> Executors (3)
>>
>>    - *Memory:* 0.0 B Used (19.0 GB Total)
>>    - *Disk:* 0.0 B Used
>>
>>  Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed
>> tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B / 6.3
GB0.0 B40041
>> ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 2ip-10-62-35-223.ec2.internal:4050000.0
B / 6.3 GB0.0 B0000
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> REN Hao
>
> Data Engineer @ ClaraVista
>
> Paris, France
>
> Tel:  +33 06 14 54 57 24
>

Mime
View raw message