tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yingda Chen (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TEZ-4000) Enable downstream vertex connecting to an EPHEMERAL data source, to reason about network connections of upstream tasks.
Date Fri, 05 Oct 2018 19:36:00 GMT
Yingda Chen created TEZ-4000:
--------------------------------

             Summary: Enable downstream vertex connecting to an EPHEMERAL data source, to
reason about network connections of upstream tasks.
                 Key: TEZ-4000
                 URL: https://issues.apache.org/jira/browse/TEZ-4000
             Project: Apache Tez
          Issue Type: Task
            Reporter: Yingda Chen


This is an umbrella task for TEZ-3997

Another property that is usually shared with CONCURRENT on the same edge is EPHEMERAL data
source. When two vertices are running concurrently, direct communications between tasks in
those vertices become possible, and oftentimes necessary, throughout the lifetime of the running
task. This can be articulated by an EPHEMERAL data sources, and this change aims to support
such scenarios, which are readily found in real-time applications (such as interactive query)
and/or customized applications that would like to control their own data communications (such
as parameter-server).

 

This change will allow Tez to be the central orchestrator that gathers necessary network information
from all upstream tasks, compiles them together and send it to downstream tasks. Particularly,
the following changes are planned:
 # For two vertices connected via an edge with both CONCURRENT scheduling type and EPHEMERAL
data source type, the task in upstream vertex will open network port, and send an VertexManagerEvent(VME)
immediately upon running. The payload of VME includes necessary information to communicate
to this task through direct network communication (such as ip and port). The vertex manager
of the downstream vertex, typed VertexManagerWithConcurrentInputs, will receive these VMEs,
and are responsible for aggregate (including de-dup if necessary) all information together
in onVertexManagerEventReceived(). 
 # Once all VMEs have been received, a CustomProcessorEvent will be constructed with a payload
that includes the aggregated information, and be routed to downstream tasks.

The change will introduce additional optional entries in VertexManagerEventPayload and a new
custom payload that will be embedded into CustomProcessorEvent. 

 

Upon completion of functional feature in this change, additional feature such as handling
of failover in CONCURRENT/EPHEMERAL edge will be addressed in future umbrea JIRAs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message