tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Sanyal <sanyalsubr...@gmail.com>
Subject Re: Tez configuration initialization ignoring JobConfigurable
Date Tue, 10 Jun 2014 07:23:50 GMT
Hi Sid,

I agree with you on "Not very sure
we want to support MR constructs like JobConfigurable for this section of
the runtime, if we can avoid it"
Definitely it would be good idea if we can come out of MR constructs
completely but, I am sure there will be many applications already built
which use such MR construct.

Using Configurable will solve the problem.
I have raised a Sub-Task for TEZ-1198:
https://issues.apache.org/jira/browse/TEZ-1200

Thanks for your inputs and suggestions.

Cheers,
Subroto Sanyal


On Tue, Jun 10, 2014 at 1:49 AM, Siddharth Seth <sseth@apache.org> wrote:

> Subroto,
> I'm guessing you already have a Comparator in place which makes use of
> JobConfigurable ?
> In terms of member fields, there isn't a lot of difference between
> Configuration and JobConf. JobConf primarily offers methods to look up the
> Configuration. In terms of serialization, they're the same.
> For things like Sort and Shuffle (which is where the comparators are being
> used), we've tried to remove direct MapReduce dependencies. Not very sure
> we want to support MR constructs like JobConfigurable for this section of
> the runtime, if we can avoid it. That said, I just filed a jira to track
> incompatible changes when using yarn-tez as the framework - TEZ-1198, could
> you please file this issue as a sub-task of this.
>
> A temporary workaround, if changing your comparator is an option, would be
> to use Configurable - and check / create a JobConf based on how it's
> configured.
>
> Thanks
> - Sid
>
>
> On Fri, Jun 6, 2014 at 9:42 PM, Subroto Sanyal <sanyalsubroto@gmail.com>
> wrote:
>
> > Hi Hitesh,
> >
> > Thanks for your inputs.
> > I would like to follow the approach mentioned in the trailing mail;
> > provided the code/processor implementation is done by non-Tez code.
> > But, how about the code which Tez provides; as I mentioned
> > the
> >
> org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext,
> > Configuration, int, long) gets its configuration
> > from org.apache.tez.runtime.library.output.OnFileSortedOutput which
> > generates the conf using:
> >
> > this.conf =
> > TezUtils.createConfFromUserPayload(getContext().getUserPayload());
> >
> > This conf is finally used to create the comparator:
> >
> > comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf);
> >
> >
> > Please let me know how this can be fixed? Do we need to change
> > org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist
> > some workaround ?
> >
> >
> > On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hitesh@apache.org> wrote:
> >
> > > Most of the MR compat layer code in Tez does something like the
> > following:
> > >
> > >     byte[] userPayload = context.getUserPayload();
> > >     Configuration conf =
> TezUtils.createConfFromUserPayload(userPayload);
> > >     if (conf instanceof JobConf) {
> > >       this.jobConf = (JobConf)conf;
> > >     } else {
> > >       this.jobConf = new JobConf(conf);
> > >     }
> > >
> > > Some of the above should probably be fixed given that the deserialized
> > > payload currently cannot be an instance of JobConf but the above should
> > > give you an idea as to what is being done. If you look into
> > > ReduceProcessor, you will see the comparator being initialized
> > > using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
> > > always be passed an instance of JobConf.
> > >
> > > Let me know if you are following the above approach or if I am missing
> > > something which should be addressed in Tez.
> > >
> > > thanks
> > > — Hitesh
> > >
> > > On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sanyalsubroto@gmail.com>
> > > wrote:
> > >
> > > Hi Hitesh,
> > >
> > > I am trying to build and execute a DAG similar to MR but, not exactly
> > > MR(have custom LogicalInput/Output and Processor implementation) which
> > > needs intermediate sorting and shuffling (configured via Edge)
> > > Lets say we have RawComparator class which looks like:
> > >
> > > public class CustomRawComparator implements RawComparator,
> > JobConfigurable
> > > {
> > >
> > > @Override
> > >
> > >    public void configure(JobConf conf) {
> > >
> > >      // some sort of init process
> > >
> > >       _comparator = blah blah blah
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public int compare(Object o1, Object o2) {
> > >
> > >        return _comparator.compare(o1, o2);
> > >
> > >    }
> > >
> > >    @Override
> > >
> > >    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
> > > l2) {
> > >
> > >        return _comparator.compare(b1, s1, l1, b2, s2, l2);
> > >
> > >    }
> > >
> > > }
> > >
> > >
> > > In my jobclient code I will write something like:
> > >
> > > jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);
> > >
> > >
> > >
> > > On the cluster side (whatever be the framework say MRv1, MRv2 or MR on
> > Tez)
> > > one would expect to get an object fully configured when
> > >
> > > ReflectionUtil.newInstance(class, conf) is invoked.
> > >
> > > The above call is being used in "ExternalSorter" class but, instead of
> > > JobConf a Conf object is being passed.which doesn't allows the
> > "configure"
> > > method of the CustomRawComparator to be invoked. "ExternalSorter" is
> used
> > > in "OnFileSortedOutput" . TezUtils provides utility to provide
> > > Configuration but, not JobConf.
> > >
> > > I think there will other situation/scenario where this problem exist in
> > Tez
> > > code base.
> > >
> > >
> > > ** I patched the Tez-common so that TezUtils.createConfFromUserPayload
> > > returns a JobConf instead on Configuration which solves the problem(may
> > not
> > > be a good solution).
> > >
> > >
> > > On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hitesh@apache.org> wrote:
> > >
> > > Hi Subroto
> > >
> > > Could you provide some more context on what you are trying to do? Are
> you
> > > trying to run MR-on-Tez? or a native Tez job?
> > > If you could provide us with some code showing what you are trying to
> do,
> > > we can help further. There are probably some bugs in the MR
> compatibility
> > > that we may have not come across.
> > >
> > > thanks
> > > — Hitesh
> > >
> > >
> > > On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <
> sanyalsubroto@gmail.com>
> > > wrote:
> > >
> > > Hi,
> > >
> > > Tez has utility which created Configuration object from the payload:
> > >
> > > TezUtils.createConfFromUserPayload(byte[] payload); this method
> returns a
> > > Configuration object even though the serialized byte[] can be of type
> > > JobConf.
> > >
> > >
> > > Once we get the Configuration we try to  create few object using
> > > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
> > >
> > > makes a
> > >
> > > check whether the conf is instance of
> "org.apache.hadoop.mapred.JobConf"
> > > and accordingly invokes the "configure" method.
> > >
> > >
> > > This behavior is not working  anymore in Tez scenario. One simple
> > >
> > > scenario
> > >
> > > when user defines a custom "RawComparator" and makes it
> "JobConfigurable"
> > > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
> > >
> > > doesn't
> > >
> > > care if the configuration could be instance of
> "org.apache.hadoop.mapred.
> > > JobConf"
> > > Please let me know if there is a problem with Tez or there exist lack
> of
> > >
> > > my
> > >
> > > understanding about how objects should be created in Tez  :-)
> > >
> > > --
> > > Cheers,
> > > *Subroto Sanyal*
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Cheers,
> > > *Subroto Sanyal*
> > >
> >
> >
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
>



-- 
Cheers,
*Subroto Sanyal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message