spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-19226) Report failure reason from Reporter Thread
Date Mon, 13 Feb 2017 11:28:41 GMT

     [ https://issues.apache.org/jira/browse/SPARK-19226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen resolved SPARK-19226.
-------------------------------
    Resolution: Not A Problem

Yes, the solution seems to be to not let the number of executors increase so much by setting
a max. You aren't showing the underlying error, though exactly what it is won't matter. If
you're saying the error should have more detail, you're looking in the wrong place. If it's
a question, then this shouldn't be a JIRA.

> Report failure reason from Reporter Thread 
> -------------------------------------------
>
>                 Key: SPARK-19226
>                 URL: https://issues.apache.org/jira/browse/SPARK-19226
>             Project: Spark
>          Issue Type: Improvement
>          Components: YARN
>    Affects Versions: 2.0.2
>         Environment: emr-5.2.1 with Zeppelin 0.6.2/Spark2.0.2 and 10 r3.xl core nodes
>            Reporter: Maheedhar Reddy Chappidi
>            Priority: Minor
>
> With the exponential[1] increase in executor count the Reporter thread [2] fails without
proper message.
> ==
> 17/01/12 09:33:44 INFO YarnAllocator: Driver requested a total number of 32767 executor(s).
> 17/01/12 09:33:44 INFO YarnAllocator: Will request 24576 executor containers, each with
2 cores and 5632 MB memory including 512 MB overhead
> 17/01/12 09:33:44 INFO YarnAllocator: Canceled 0 container requests (locality no longer
needed)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34419 executor(s).
> 17/01/12 09:33:52 INFO ApplicationMaster: Final app status: FAILED, exitCode: 12, (reason:
Exception was thrown 1 time(s) from Reporter thread.)
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34410 executor(s).
> 17/01/12 09:33:52 INFO YarnAllocator: Driver requested a total number of 34409 executor(s).
> 17/01/12 09:33:52 INFO ShutdownHookManager: Shutdown hook called
> ==
> We were able to run the workflows by setting/limiting the maxExecutor count (spark.dynamicAllocation.maxExecutors)
to avoid more requests(35k->65k).
> Added I don't see any issues with ApplicationMaster's container memory/compute.
> Is it possible to parse more ErrorReason from if/else?
> [1]  https://github.com/apache/spark/blob/6ee28423ad1b2e6089b82af64a31d77d3552bb38/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala
> [2] https://github.com/apache/spark/blob/01e14bf303e61a5726f3b1418357a50c1bf8b16f/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L446-L480



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message