spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-23974) Do not allocate more containers as expected in dynamic allocation
Date Thu, 07 Feb 2019 18:13:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-23974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin resolved SPARK-23974.
------------------------------------
    Resolution: Not A Problem

Closing based on the above comment.

> Do not allocate more containers as expected in dynamic allocation
> -----------------------------------------------------------------
>
>                 Key: SPARK-23974
>                 URL: https://issues.apache.org/jira/browse/SPARK-23974
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.1
>            Reporter: Darcy Shen
>            Priority: Major
>
> Using Yarn with dynamic allocation enabled, spark does not allocate more containers when
current containers(executors) number is less than the max executor num.
> For example, we only have 7 executors working, while our cluster is not busy, and I have
set
> {\{ spark.dynamicAllocation.maxExecutors = 600}}
> {{and the current jobs of the context are executed slowly.}}
>  
> A live case with online logs:
> ```
> $ grep "Not adding executors because our current target total" spark-job-server.log.9
| tail
> [2018-04-12 16:07:19,070] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:20,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:21,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:22,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:23,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:24,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:25,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:26,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:27,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 16:07:28,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> $ grep "Not adding executors because our current target total" spark-job-server.log.9
| head
> [2018-04-12 13:52:18,067] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:19,071] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:20,072] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:21,073] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:22,074] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:23,075] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:24,076] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:25,077] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:26,078] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> [2018-04-12 13:52:27,079] DEBUG .ExecutorAllocationManager [] [akka://JobServer/user/jobManager]
- Not adding executors because our current target total is already 600 (limit 600)
> $ grep "Not adding executors because our current target total" spark-job-server.log.9
| wc -l
> 8111
> ```
> The logs mean that we are keeping the `numExecutorsTarget == maxNumExecutors == 600`
without requesting new executors. And at that time, we only have 7 executors available for
our users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message