spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Zvara <zoltan.zv...@gmail.com>
Subject Re: Spark Executor resources
Date Tue, 24 Mar 2015 15:48:41 GMT
I'm trying to log Tasks to understand physical plan and to visualize which
RDD's which partition is currently computed from which creation site along
with other information. I want to charge the TaskRunner to do this before
actually invoking runTask() on Task and again just before giving the Task
to the GC when metrics are collected. Along with the information I wish to
log, I want to report, log the resources the Executor allocates to run its
Tasks.

Zvara Zoltán



mail, hangout, skype: zoltan.zvara@gmail.com

mobile, viber: +36203129543

bank: 10918001-00000021-50480008

address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a

elte: HSKSJZ (ZVZOAAI.ELTE)

2015-03-24 16:42 GMT+01:00 Sandy Ryza <sandy.ryza@cloudera.com>:

> That's correct.  What's the reason this information is needed?
>
> -Sandy
>
> On Tue, Mar 24, 2015 at 11:41 AM, Zoltán Zvara <zoltan.zvara@gmail.com>
> wrote:
>
>> Thank you for your response!
>>
>> I guess the (Spark)AM, who gives the container leash to the NM (along
>> with the executor JAR and command to run) must know how many CPU or RAM
>> that container capped, isolated at. There must be a resource vector along
>> the encrypted container leash if I'm right that describes this. Or maybe is
>> there a way for the ExecutorBackend to fetch this information directly from
>> the environment? Then, the ExecutorBackend would be able to hand over this
>> information to the actual Executor who creates the TaskRunner.
>>
>> Zvara Zoltán
>>
>>
>>
>> mail, hangout, skype: zoltan.zvara@gmail.com
>>
>> mobile, viber: +36203129543
>>
>> bank: 10918001-00000021-50480008
>>
>> address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a
>>
>> elte: HSKSJZ (ZVZOAAI.ELTE)
>>
>> 2015-03-24 16:30 GMT+01:00 Sandy Ryza <sandy.ryza@cloudera.com>:
>>
>>> Hi Zoltan,
>>>
>>> If running on YARN, the YARN NodeManager starts executors.  I don't
>>> think there's a 100% precise way for the Spark executor way to know how
>>> many resources are allotted to it.  It can come close by looking at the
>>> Spark configuration options used to request it (spark.executor.memory and
>>> spark.yarn.executor.memoryOverhead), but it can't necessarily for the
>>> amount that YARN has rounded up if those configuration properties
>>> (yarn.scheduler.minimum-allocation-mb and
>>> yarn.scheduler.increment-allocation-mb) are not present on the node.
>>>
>>> -Sandy
>>>
>>> -Sandy
>>>
>>> On Mon, Mar 23, 2015 at 5:08 PM, Zoltán Zvara <zoltan.zvara@gmail.com>
>>> wrote:
>>>
>>>> Let's say I'm an Executor instance in a Spark system. Who started me and
>>>> where, when I run on a worker node supervised by (a) Mesos, (b) YARN? I
>>>> suppose I'm the only one Executor on a worker node for a given framework
>>>> scheduler (driver). If I'm an Executor instance, who is the closest
>>>> object
>>>> to me who can tell me how many resources do I have on (a) Mesos, (b)
>>>> YARN?
>>>>
>>>> Thank you for your kind input!
>>>>
>>>> Zvara Zoltán
>>>>
>>>>
>>>>
>>>> mail, hangout, skype: zoltan.zvara@gmail.com
>>>>
>>>> mobile, viber: +36203129543
>>>>
>>>> bank: 10918001-00000021-50480008
>>>>
>>>> address: Hungary, 2475 Kápolnásnyék, Kossuth 6/a
>>>>
>>>> elte: HSKSJZ (ZVZOAAI.ELTE)
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message