flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5668) Reduce dependency on HDFS at job startup time
Date Mon, 27 Feb 2017 17:55:46 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886220#comment-15886220

Bill Liu commented on FLINK-5668:

 [~wheat9]] and I are working on implementing a flink job deployer  for a Yarn with `HttpFs`
and `S3`.
The Yarn Container could resolve the `http/s3`  file scheme. 

We use `HttpFs` instead of `HDFS` to bootstrap the JobManager
Here is the code to set up the AM container (JobManager)
    Path resourcePath = new Path("http://localhost:19989/flink-dist.jar")
    FileStatus fileStatus = resourcePath.getFileSystem(yarnConfiguration)
    LOG.info("resource {}", ConverterUtils.getYarnUrlFromPath(resourcePath));
    LocalResource packageResource =
                    LocalResourceType.FILE, LocalResourceVisibility.APPLICATION,
                    fileStatus.getLen(), fileStatus.getModificationTime());
    LOG.info("add localresource {}", packageResource);
    localResources.put("flink.jar", packageResource);
`yarn.deploy.fs`  is not a goog idea, because these bootstrap jars/files may be located on
different filesystem.
It's better to parse the jar Path to get the underneath filesystem of jar.

> Reduce dependency on HDFS at job startup time
> ---------------------------------------------
>                 Key: FLINK-5668
>                 URL: https://issues.apache.org/jira/browse/FLINK-5668
>             Project: Flink
>          Issue Type: Improvement
>          Components: YARN
>            Reporter: Bill Liu
>   Original Estimate: 48h
>  Remaining Estimate: 48h
> When create a Flink cluster on Yarn,  JobManager depends on  HDFS to share  taskmanager-conf.yaml
 with TaskManager.
> It's better to share the taskmanager-conf.yaml  on JobManager Web server instead of HDFS,
which could reduce the HDFS dependency  at job startup.

This message was sent by Atlassian JIRA

View raw message