tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bikas Saha <bi...@hortonworks.com>
Subject RE: A few questions on the APIs
Date Wed, 14 Aug 2013 18:51:39 GMT
Thanks for playing with it.

When you say you are trying to make the MRRSleep job be pure Tez are you
intent on removing the map processor and reduce processor and writing your
own processor?

You are right that Processors represent actual computations. However, they
do need to be able to send control plane information back to the AM for
basic things like progress and advanced things like data for some user
code vertex manager that determines the properties for the next vertex.
Hence some subset of the umbilical or some reference context that connects
the processor to the umbilical is necessary to be exposed to the
processors. Currently we are using a mix of MRTask, MapProcessor etc. to
achieve the end goal because we wanted to get MR based functionality
working asap to give real world benefits.

The API's and separation of concerns have not been cleanly established in
that part of the code. We ideally want YarnTezDAGChild (the main Tez
shell) to be able to instantiate processors and pass them some context
object by which they can communicate essential information back to the
control plane. We are not there yet. Which is why we haven't been able to
write a multi-input multi-output processor yet. Its on the agenda and
becoming increasingly important.

Would be great if you can provide a list of weirdness and issues that you
have discovered that will serve as a feedback for us when we clean this
part up. Even better if you want to help us clean it up.


-----Original Message-----
From: Mark Wagner [mailto:wagner.mark.d@gmail.com]
Sent: Tuesday, August 13, 2013 9:29 PM
To: dev@tez.incubator.apache.org
Subject: A few questions on the APIs

Hey everyone,

I've been playing with the MRRSleep example to familiarize myself with
Tez. I've been trying to remove all the map and reduce parts to make it
"pure" Tez as an exercise, but I'm a bit hung up on the roles of
Processors and Tasks. It seems like they serve very similar roles. My
expectation was that Tasks would handle all the start-up and coordination
with the DAG AM, while the processors are more user-facing and would
mostly focus on the actual computation (given that the processor can be
specified via the DAG APIs). But it looks like MRTask (which is ultimately
extended as a Map or Reduce
Processor) does sends completion notifications to the AM with the
umbilical. Is there a good guideline as to what the responsibilities of
Processors and Tasks are and where the separation is?

Thanks for the insights,

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message