spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Semenov <va...@datadoghq.com>
Subject Re: Spark EMR executor-core vs Vcores
Date Mon, 26 Feb 2018 21:59:54 GMT
yeah, for some reason (unknown to me, but you can find on aws forums) they
double the actual number of cores for nodemanagers.

I assume that's done to maximize utilization, but doesn't really matter to
me, at least, since I only run Spark, so I, personally, set `total number
of cores - 1/2` saving one core for the OS/DataNode/NodeManager, because
Spark itself can create a significant load.

On Mon, Feb 26, 2018 at 4:51 PM, Selvam Raman <selmna@gmail.com> wrote:

> Thanks. That’s make sense.
>
> I want to know one more think , available vcore per machine is 16 but
> threads per node 8. Am I missing to relate here.
>
> What I m thinking now is number of vote = number of threads.
>
>
>
> On Mon, 26 Feb 2018 at 18:45, Vadim Semenov <vadim@datadoghq.com> wrote:
>
>> All used cores aren't getting reported correctly in EMR, and YARN itself
>> has no control over it, so whatever you put in `spark.executor.cores` will
>> be used,
>> but in the ResourceManager you will only see 1 vcore used per nodemanager.
>>
>> On Mon, Feb 26, 2018 at 5:20 AM, Selvam Raman <selmna@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> spark version - 2.0.0
>>> spark distribution - EMR 5.0.0
>>>
>>> Spark Cluster - one master, 5 slaves
>>>
>>> Master node - m3.xlarge - 8 vCore, 15 GiB memory, 80 SSD GB storage
>>> Slave node - m3.2xlarge - 16 vCore, 30 GiB memory, 160 SSD GB storage
>>>
>>>
>>> Cluster Metrics
>>> Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory
>>> UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive
>>> NodesDecommissioning NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted
>>> Nodes
>>> 16 0 1 15 5 88.88 GB 90.50 GB 22 GB 5 79 1 5
>>> <http://localhost:8088/cluster/nodes> 0
>>> <http://localhost:8088/cluster/nodes/decommissioning> 0
>>> <http://localhost:8088/cluster/nodes/decommissioned> 5
>>> <http://localhost:8088/cluster/nodes/lost> 0
>>> <http://localhost:8088/cluster/nodes/unhealthy> 0
>>> <http://localhost:8088/cluster/nodes/rebooted>
>>> I have submitted job with below configuration
>>> --num-executors 5 --executor-cores 10 --executor-memory 20g
>>>
>>>
>>>
>>> spark.task.cpus - be default 1
>>>
>>>
>>> My understanding is there will be 5 executore each can run 10 task at a
>>> time and task can share total memory of 20g. Here, i could see only 5
>>> vcores used which means 1 executor instance use 20g+10%overhead ram(22gb),
>>> 10 core(number of threads), 1 Vcore(cpu).
>>>
>>> please correct me if my understand is wrong.
>>>
>>> how can i utilize number of vcore in EMR effectively. Will Vcore boost
>>> performance?
>>>
>>>
>>> --
>>> Selvam Raman
>>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>>
>>
>> --
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>

Mime
View raw message