flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5542) YARN client incorrectly uses local YARN config to check vcore capacity
Date Mon, 08 Oct 2018 08:51:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16641541#comment-16641541

ASF GitHub Bot commented on FLINK-5542:

leanken opened a new pull request #6775: [FLINK-5542] use YarnCluster vcores setting to do
MaxVCore validation
URL: https://github.com/apache/flink/pull/6775
   ## What is the purpose of the change
   See. [FLINK-5542](https://issues.apache.org/jira/browse/FLINK-5542)
   use YarnCluster vcores setting to do MaxVCore validation instead of using the local yarn
   ## Brief change log
   - *Fetch MaxVCore num via yarnClient instead of using the local yarn conf*
   ## Verifying this change
   This change is a trivial rework / code cleanup without any test coverage.
   But I already reproduced the [Error/Exception](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html),
after the fix, it's working as expected.
   ## Does this pull request potentially affect one of the following parts:
     - Dependencies (does it add or upgrade a dependency): (no)
     - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
     - The serializers: (no)
     - The runtime per-record code paths (performance sensitive): (no)
     - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing,
Yarn/Mesos, ZooKeeper: (no)
     - The S3 file system connector: (no)
   ## Documentation
     - Does this pull request introduce a new feature? (no)
     - If yes, how is the feature documented? (not documented)

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> YARN client incorrectly uses local YARN config to check vcore capacity
> ----------------------------------------------------------------------
>                 Key: FLINK-5542
>                 URL: https://issues.apache.org/jira/browse/FLINK-5542
>             Project: Flink
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 1.1.4, 1.5.3, 1.6.0, 1.7.0
>            Reporter: Shannon Carey
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.7.0
> See http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html
> When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 1.1.4 is comparing
the user's selected number of vcores to the vcores configured in the local node's YARN config
(from YarnConfiguration eg. yarn-site.xml and yarn-default.xml). It incorrectly prevents Flink
from launching even if there is sufficient vcore capacity on the cluster.
> That is not correct, because the application will not necessarily run on the local node.
For example, if running the yarn-session.sh client from the AWS EMR master node, the vcore
count there may be different from the vcore count on the core nodes where Flink will actually
> A reasonable way to fix this would probably be to reuse the logic from "yarn-session.sh
-q" (FlinkYarnSessionCli line 550) which knows how to get vcore information from the real
worker nodes.  Alternatively, perhaps we could remove the check entirely and rely on YARN's
Scheduler to determine whether sufficient resources exist.

This message was sent by Atlassian JIRA

View raw message