ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anton Dmitriev (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-10133) ML: Switch to per-node TensorFlow worker strategy
Date Fri, 02 Nov 2018 15:20:01 GMT
Anton Dmitriev created IGNITE-10133:

             Summary: ML: Switch to per-node TensorFlow worker strategy
                 Key: IGNITE-10133
                 URL: https://issues.apache.org/jira/browse/IGNITE-10133
             Project: Ignite
          Issue Type: Improvement
          Components: ml
    Affects Versions: 2.8
            Reporter: Anton Dmitriev
            Assignee: Anton Dmitriev
             Fix For: 2.8

Currently we start TensorFlow worker process per every cache partition. In case node is equipped
by GPU and TensorFlow uses this GPU it acquires all GPU memory. If two worker processes try
to acquire all GPU memory they will fail.

To eliminate this problem and allow users utilizing GPU during the training we need to switch
to per-node strategy. It means we need to start one TensorFlow worker process per node, not
per partition.

This message was sent by Atlassian JIRA

View raw message