spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao REN <julien19890...@gmail.com>
Subject Re: How to balance task load
Date Thu, 05 Dec 2013 10:27:23 GMT
Hi Andrew,

My data was loaded in HDFS. Actually, I got the answer from the spark-user
google group.

Patrick said:

All cores in the cluster are considered fungible since the tasks are
completely parallel. So until you run out of cores on any given node, it
might get all the tasks.

In some cases this provides *better* performance because you aren't moving
data around as much.

Thank you for your reply. =)


2013/12/5 Andrew Ash <andrew@andrewash.com>

> Hi Hao,
>
> Where tasks go is influenced by where the data they operate on resides.
>  If the data is on one executor, it may make more sense to do all the
> computation on that node rather than ship data across the network.  How was
> the data distributed across your cluster?
>
> Andrew
>
>
> On Mon, Dec 2, 2013 at 7:52 AM, Hao REN <julien19890118@gmail.com> wrote:
>
>> Sorry for spam.
>>
>> To complete the my previous post:
>>
>> The map action sometimes creates 4 tasks which are all executed by the
>> same executor.
>>
>> I believe that if a task dispatch like:
>> executor_0 : 1 task;
>> executor_1 : 1 task;
>> executor_2 : 2 task;
>> it will give a better performance.
>>
>> Can we force this kind of schedule in Spark ?
>>
>> Thank you.
>>
>>
>>
>> 2013/12/2 Hao REN <julien19890118@gmail.com>
>>
>>> Hi,
>>>
>>> When running some tests on EC2 with spark, I notice that: the tasks are
>>> not fairly distributed to executor.
>>>
>>> For example, a map action produces 4 tasks, but they all go to the
>>>
>>>
>>> Executors (3)
>>>
>>>    - *Memory:* 0.0 B Used (19.0 GB Total)
>>>    - *Disk:* 0.0 B Used
>>>
>>>  Executor IDAddress RDD blocksMemory used Disk usedActive tasks Failed
>>> tasksComplete tasks Total tasks 0 ip-10-10-141-143.ec2.internal:52816 00.0 B
/ 6.3 GB0.0 B40041
>>> ip-10-40-38-190.ec2.internal:60314 0 0.0 B / 6.3 GB 0.0 B0 0 00 2ip-10-62-35-223.ec2.internal:4050000.0
B / 6.3 GB0.0 B0000
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> REN Hao
>>
>> Data Engineer @ ClaraVista
>>
>> Paris, France
>>
>> Tel:  +33 06 14 54 57 24
>>
>
>


-- 
REN Hao

Data Engineer @ ClaraVista

Paris, France

Tel:  +33 06 14 54 57 24

Mime
View raw message