spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-25563) Spark application hangs If container allocate on lost Nodemanager
Date Tue, 08 Oct 2019 05:42:13 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyukjin Kwon resolved SPARK-25563.
----------------------------------
    Resolution: Incomplete

> Spark application hangs If container allocate on lost Nodemanager
> -----------------------------------------------------------------
>
>                 Key: SPARK-25563
>                 URL: https://issues.apache.org/jira/browse/SPARK-25563
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.1
>            Reporter: devinduan
>            Priority: Minor
>              Labels: bulk-closed
>
>     I met a issue that if  I start a spark application use yarn client mode, application
sometimes hang.
>     I check the application logs,  container allocate on a lost NodeManager, but AM
don't retry to start another executor.
>     My spark version is 2.3.1
>     Here is my ApplicationMaster log.
>  
> 2018-09-26 05:21:15 INFO YarnRMClient:54 - Registering the ApplicationMaster
> 2018-09-26 05:21:15 INFO ConfiguredRMFailoverProxyProvider:100 - Failing over to rm2

> 2018-09-26 05:21:15 WARN Utils:66 - spark.executor.instances less than spark.dynamicAllocation.minExecutors
is invalid, ignoring its setting, please update your configs.
> 2018-09-26 05:21:15 INFO Utils:54 - Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors,
spark.dynamicAllocation.minExecutors and spark.executor.instances
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Will request 1 executor container(s), each
with 24 core(s) and 20275 MB memory (including 1843 MB of overhead)
> 2018-09-26 05:21:15 INFO YarnAllocator:54 - Submitted 1 unlocalized container requests.
> 2018-09-26 05:21:15 INFO ApplicationMaster:54 - Started progress reporter thread with
(heartbeat : 3000, initial allocation : 200) intervals
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Cannot find executorId for container: container_1532951609168_4721728_01_000002
> 2018-09-26 05:21:27 INFO YarnAllocator:54 - Completed container container_1532951609168_4721728_01_000002
(state: COMPLETE, exit status: -100)
> 2018-09-26 05:21:27 WARN YarnAllocator:66 - Container marked as failed: container_1532951609168_4721728_01_000002.
Exit status: -100. Diagnostics: Container released on a *lost* node



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message