spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Lost executor on YARN ALS iterations
Date Thu, 21 Aug 2014 23:47:07 GMT
Sandy,

I put spark.yarn.executor.memoryOverhead 1024 on spark-defaults.conf but I
don't see environment variable on spark properties on the webui->environment

Does it need to go in spark-env.sh ?

Thanks.
Deb


On Wed, Aug 20, 2014 at 12:39 AM, Sandy Ryza <sandy.ryza@cloudera.com>
wrote:

> Hi Debasish,
>
> The fix is to raise spark.yarn.executor.memoryOverhead until this goes
> away.  This controls the buffer between the JVM heap size and the amount of
> memory requested from YARN (JVMs can take up memory beyond their heap
> size). You should also make sure that, in the YARN NodeManager
> configuration, yarn.nodemanager.vmem-check-enabled is set to false.
>
> -Sandy
>
>
> On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das <debasish.das83@gmail.com>
> wrote:
>
>> I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
>> definitely a YARN related problem...
>>
>> At least for me right now only deployment option possible is standalone...
>>
>>
>>
>> On Tue, Aug 19, 2014 at 11:29 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>
>>> Hi Deb,
>>>
>>> I think this may be the same issue as described in
>>> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
>>> container got killed by YARN because it used much more memory that it
>>> requested. But we haven't figured out the root cause yet.
>>>
>>> +Sandy
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das <debasish.das83@gmail.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > During the 4th ALS iteration, I am noticing that one of the executor
>>> gets
>>> > disconnected:
>>> >
>>> > 14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
>>> > SendingConnectionManagerId not found
>>> >
>>> > 14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
>>> > disconnected, so removing it
>>> >
>>> > 14/08/19 23:40:00 ERROR cluster.YarnClientClusterScheduler: Lost
>>> executor 5
>>> > on tblpmidn42adv-hdp.tdc.vzwcorp.com: remote Akka client disassociated
>>> >
>>> > 14/08/19 23:40:00 INFO scheduler.DAGScheduler: Executor lost: 5 (epoch
>>> 12)
>>> > Any idea if this is a bug related to akka on YARN ?
>>> >
>>> > I am using master
>>> >
>>> > Thanks.
>>> > Deb
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message