tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrian Nicoara (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-4060) NoOpVertexManager schedules tasks that are not ready to run
Date Thu, 04 Apr 2019 20:32:00 GMT
Adrian Nicoara created TEZ-4060:
-----------------------------------

             Summary: NoOpVertexManager schedules tasks that are not ready to run
                 Key: TEZ-4060
                 URL: https://issues.apache.org/jira/browse/TEZ-4060
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.1
            Reporter: Adrian Nicoara


During recovery, vertices which have already been reconfigured get assigned a NoOpVertexManager:
[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L2689-L2711]

[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/RecoveryParser.java#L970-L972]

The NoOpVertexManager directly schedules tasks upon being started:

[https://github.com/apache/tez/blob/8395a9560a131799f1af49b26e1f10f12ef48752/tez-dag/src/main/java/org/apache/tez/dag/app/dag/impl/VertexImpl.java#L4628]

However, for a large graph, we can end up having all vertices configured and started, before
many of their inputs (for vertices that are not attached to the roots) are generated.

This ends up scheduling tasks which are not ready to run, and will ultimately fail until their
inputs are generated.

In addition to bypassing input dependency checking, which is generally done in VertexManagerPlugin#onSourceTaskCompleted,
we lose the ability of executing custom logic within our own VertexManagerPlugins that is
needed for the configuration of downstream vertices. This is due to the fact that we communicate
some graph configuration metadata through global objects that are populated through calls
to VertexManagerPlugin#onVertexStateUpdated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message