hadoop-yarn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhengchenyu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (YARN-6568) A queue which runs a long time job couldn't acquire any container for long time.
Date Mon, 08 May 2017 08:05:04 GMT
zhengchenyu created YARN-6568:

             Summary: A queue which runs a long time job couldn't acquire any container for
long time.
                 Key: YARN-6568
                 URL: https://issues.apache.org/jira/browse/YARN-6568
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler
    Affects Versions: 2.7.1
         Environment: CentOS 7.1
            Reporter: zhengchenyu
             Fix For: 2.7.4

In our cluster, we find some applications couldn't acquire any container for long time. (Note:
we use FairSharePolicy and FairScheduler)

First, I found some unreasonable configuration, we set minRes=maxRes. So some application
keep pending for long time, we kill some large applicaiton to solve this problem. Then we
changed this configuration, this problem relieves. 

But this problem is not completely solved. In our cluster, I found applications in  some queue
which request few container keep pending for long time. 

I simulate in test cluster. I submit DistributedShell application which run many loo applications
to queueA, then I submit my own yarn application which request container and release container
constantly to queueB.  At this time, any applicaitons which are submmited to queueA keep pending!

We know this is the problem of FairSharePolicy, it consider the request of queue. So after
sort the queues, some queues which have few request are ordered last all time.

We know if the AM container is launched, then the request will increase, But FairSharePolicy
can't distinguish which request is AM request. I think if am container is assigned, the problem
is solved. 

Our companion discuss this problem. we recommend set a timeout for queue, it means the time
length of a queue is not assigned. If timeout, we set this queue to the first place of queues

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org

View raw message