flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eastcirclek <...@git.apache.org>
Subject [GitHub] flink issue #5307: [FLINK-8431] [mesos] Allow to specify # GPUs for TaskMana...
Date Tue, 30 Jan 2018 02:00:55 GMT
Github user eastcirclek commented on the issue:

    As you pointed out, the discussion we had in the mailing list was about JM not starting
TMs on GPU-equipped agents. It turned out that a Mesos framework needs to specify a `GPU_RESOURCES`
capability if it wants to get resource offers that contain GPUs [[link]](http://mesos.apache.org/documentation/latest/gpu-support/#framework-capabilities).
I managed to start TMs on the GPU-equipped agents by specifying a master flag `--fliter_gpu_resources`
when starting the Mesos master. [MESOS-7576](https://issues.apache.org/jira/browse/MESOS-7576)
introduces `--filter_gpu_resources` and, when the flag is set to false, Mesos frameworks that
do not have `GPU_RESOURCES` capability can receive offers that contain GPUs from the Mesos
master. The problem seemed to be figured out without modifying Flink. 
    The reason I create [FLINK-8431](https://issues.apache.org/jira/browse/FLINK-8431) to
allow to specify # gpus is that TMs are not going to see GPUs if they do not request GPUs
explicitly and GPUs are isolated as shown in [link](http://mesos.apache.org/documentation/latest/gpu-support/#agent-flags).
    Regarding your question,
    > Is the original problem which we want to solve that Flink does not use agents which
have GPU resources or that Flink cannot specify the number of GPUs it requires to run? It
looks as if the PR solves the latter ...
    Yes, the scope of FLINK-8431 and this PR is confined to the latter.
    > but I was wondering whether we shouldn't solve the former problem.
    I don't think we need to take care of the former anymore because `GPU_RESOURCES` is going
to be deprecated in favor of the reservation mechanism as shown in [link](https://www.mail-archive.com/dev@mesos.apache.org/msg37571.html)
and [MESOS-7576](https://issues.apache.org/jira/browse/MESOS-7576). Thus, we need not split
servers into two categories (CPU-only servers and GPU-equipped servers) anymore. Nevertheless,
we need to specify `GPU_RESOURCES` until it is completely deprecated in Mesos-2.x. To this
end, I add a `GPU_RESOURCES` capability if # gpus are larger than 0.
    For those who are in a situation in which JM does not get offers that contains GPUs, I'd
like to suggest to restart the Mesos master with `--filter_gpu_resources` set to false as
explained above.


View raw message