tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Syed Shameerur Rahman (Jira)" <j...@apache.org>
Subject [jira] [Created] (TEZ-4140) TEZ Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery
Date Tue, 07 Apr 2020 12:55:00 GMT
Syed Shameerur Rahman created TEZ-4140:
------------------------------------------

             Summary: TEZ Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery
                 Key: TEZ-4140
                 URL: https://issues.apache.org/jira/browse/TEZ-4140
             Project: Apache Tez
          Issue Type: Bug
    Affects Versions: 0.9.2, 0.9.1, 0.8.4, 0.9.0, 0.8.2
            Reporter: Syed Shameerur Rahman
            Assignee: Syed Shameerur Rahman
             Fix For: 0.10.0, 0.9.3
         Attachments: DAG.png

*Issue*:

During vertex recovery, the initialization stage of vertex is skipped if

1) VertexInputInitializerEvent
2) VertexReconfigureDoneEvent

are seen in the recovery data. Further the initialization stage is skipped by replacing any
VertexManagerPlugin (Eg: ShuffleVertexManager, CustomVertexManager etc) by NoOpVertexManager.
There are couple of issues in replacing VertexManagerPlugin with NoOpVertexManager

1) Completeness of any VertexManagerPlugin is only after the tasks are launched in that vertex,
So using NoOpVertexManager without checking whether tasks for that particular vertex were
launched in previous run might result in some kind of discrepancy in deciding when and how
many tasks should be launched in that vertex during recovery.

2) Maintaining vertex dependency:
Say for example we have two vertices v1 and v2 and v2 is dependent on v1 (v1 ---> v2),
and for some reasons if v1 was not able to skip initialization stage and v2 was able to skip
initialization stage and there is a chance that v2 might get scheduled before v1 since NoOpVertexManager
is used.

The above mentioned problem is what i have faced. Attached a DAG for reference:

In the DAG, Reducer 7 is dependent on Reducer 6 and for some reason during Tez Recovery, Reducer
6's initialization stage was not skipped where as Reducer 7's initialization stage was skipped
and NoOpVertexManager was used instead of ShuffleVertexManager which went on to launch all
the tasks in Reducer 7 before waiting in for Reducer 6's completion. Initially it was decided
that Reducer 6 will be launching 14 tasks and as per that information, Tasks launched in Reducer
7 was waiting for 14 shuffle inputs but later due to AutoReduce parallelism No. of tasks in
Reducer 6 was adjusted to 1 and the Reducer 7's tasks didn't know about this and kept on waiting
for 14 shuffle inputs but in actual there was only 1, hence the query was stuck. This can
also lead to deadlock when no. of containers are limited and Reducer 7 ends up using all of
them.



*Proposed Solution:*
In addition to the condition of VertexInputInitializerEvent and VertexReconfigureDoneEvent,
introduce couple more conditions:

1) Check whether tasks were launched in the vertex in the previous run before replacing VertexManagerPlugin
with NoOpVertexManager

2) All the parent vertices should have skipped initialization stage before the child vertex
does it. This is required to maintain vertex dependency




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message