tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Wagner <wagner.mar...@gmail.com>
Subject Re: A few questions on the APIs
Date Wed, 14 Aug 2013 23:22:19 GMT
Yes, that's exactly what I'm trying to do. It definitely makes sense
that the processor needs to do some control plane stuff. I guess the
question then is are both Task and Processor needed? It seems like
what the Task does could be split between YarnTezDAGChild and
user-specific Processors.

The Task vs Processor is the main weirdness I've run into I think.
Some of this is exacerbated by the naming of Task, MRTask, and
MRRuntimeTask. The other things I've run into:
- Processors need to have a constructor with TezEngineTaskContext as
an argument, but that isn't mentioned anywhere
- When I was trying my processor (which didn't give completion events
to the AM), I got some misleading logs saying "No task currently
assigned to container...". It turns out my processor had run and went
back to the AM looking for more work (I assume this is for container
reuse, yes?). It'd be nice if this case could be distinguished with
e.g. "The task currently assigned to this container hasn't completed".


On Wed, Aug 14, 2013 at 11:51 AM, Bikas Saha <bikas@hortonworks.com> wrote:
> Thanks for playing with it.
> When you say you are trying to make the MRRSleep job be pure Tez are you
> intent on removing the map processor and reduce processor and writing your
> own processor?
> You are right that Processors represent actual computations. However, they
> do need to be able to send control plane information back to the AM for
> basic things like progress and advanced things like data for some user
> code vertex manager that determines the properties for the next vertex.
> Hence some subset of the umbilical or some reference context that connects
> the processor to the umbilical is necessary to be exposed to the
> processors. Currently we are using a mix of MRTask, MapProcessor etc. to
> achieve the end goal because we wanted to get MR based functionality
> working asap to give real world benefits.
> The API's and separation of concerns have not been cleanly established in
> that part of the code. We ideally want YarnTezDAGChild (the main Tez
> shell) to be able to instantiate processors and pass them some context
> object by which they can communicate essential information back to the
> control plane. We are not there yet. Which is why we haven't been able to
> write a multi-input multi-output processor yet. Its on the agenda and
> becoming increasingly important.
> Would be great if you can provide a list of weirdness and issues that you
> have discovered that will serve as a feedback for us when we clean this
> part up. Even better if you want to help us clean it up.
> Bikas
> -----Original Message-----
> From: Mark Wagner [mailto:wagner.mark.d@gmail.com]
> Sent: Tuesday, August 13, 2013 9:29 PM
> To: dev@tez.incubator.apache.org
> Subject: A few questions on the APIs
> Hey everyone,
> I've been playing with the MRRSleep example to familiarize myself with
> Tez. I've been trying to remove all the map and reduce parts to make it
> "pure" Tez as an exercise, but I'm a bit hung up on the roles of
> Processors and Tasks. It seems like they serve very similar roles. My
> expectation was that Tasks would handle all the start-up and coordination
> with the DAG AM, while the processors are more user-facing and would
> mostly focus on the actual computation (given that the processor can be
> specified via the DAG APIs). But it looks like MRTask (which is ultimately
> extended as a Map or Reduce
> Processor) does sends completion notifications to the AM with the
> umbilical. Is there a good guideline as to what the responsibilities of
> Processors and Tasks are and where the separation is?
> Thanks for the insights,
> Mark
> --
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

View raw message