tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingda Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-4018) Allow conditional vertex in DAG execution
Date Tue, 13 Nov 2018 20:43:00 GMT
Yingda Chen created TEZ-4018:

             Summary: Allow conditional vertex in DAG execution
                 Key: TEZ-4018
                 URL: https://issues.apache.org/jira/browse/TEZ-4018
             Project: Apache Tez
          Issue Type: New Feature
            Reporter: Yingda Chen

A high-level description is provided here for now, will follow up with a proper design doc

We have encountered a few application scenarios for dynamic (logical) DAG here in our system.
A typical one is for a distributed query to be able to dynamically choose among two execution
paths, say among hash join and sorted merge join.

This can be solved by allowing TEZ to execute "conditional DAG". By that we mean that a DAG
may have some conditional vertices: several conditional verteices can form a conditioanl group,
and insided each group, only one will be chosen for execution at runtime.

To allow decision at runtime, each conditional group will be associated with a “control
vertex”. A control vertex can be a pure virtual component that lives only on AM with its
VertexImpl and VetexManager, but has no associated tasks (DoP = 0). It can be also be extended
to have physical tasks associated with it, in the case where intensitve compuation may be
required to make a control decision.

A upstream vertex (that produces input data to verteices in downstream conditioanl group)
will be connected to the control vertex, as well as all conditional vertices in the group
at the same time. This allows its VMEs and DMEs to be sent to all of them. Upon receipt of
(enough) VMEs, the control vertex would be able to gather enough runtime statistics and determine
(by user-supplied logic) which downstream vertex should be scheduled (and which should be
skipped). Such decision will effectively “uncondition” the DAG and determines the sub-graph
that is actually being executed.

Such conditional verteices can be useful to enable scenarios such as conditional join, where
a query can choose between hash join and sorted merge join at runtime, base on the precious
runtime statistics (e.g., output size) of upstream mapper.

This message was sent by Atlassian JIRA

View raw message