tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Sanyal <sanyalsubr...@gmail.com>
Subject Re: Tez configuration initialization ignoring JobConfigurable
Date Fri, 06 Jun 2014 17:37:02 GMT
Hi Hitesh,

I am trying to build and execute a DAG similar to MR but, not exactly
MR(have custom LogicalInput/Output and Processor implementation) which
needs intermediate sorting and shuffling (configured via Edge)
Lets say we have RawComparator class which looks like:

public class CustomRawComparator implements RawComparator, JobConfigurable {

@Override

    public void configure(JobConf conf) {

      // some sort of init process

       _comparator = blah blah blah

    }

    @Override

    public int compare(Object o1, Object o2) {

        return _comparator.compare(o1, o2);

    }

    @Override

    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
l2) {

        return _comparator.compare(b1, s1, l1, b2, s2, l2);

    }

}


In my jobclient code I will write something like:

jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);



On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
one would expect to get an object fully configured when

ReflectionUtil.newInstance(class, conf) is invoked.

The above call is being used in "ExternalSorter" class but, instead of
JobConf a Conf object is being passed.which doesn't allows the "configure"
method of the CustomRawComparator to be invoked. "ExternalSorter" is used
in "OnFileSortedOutput" . TezUtils provides utility to provide
Configuration but, not JobConf.

I think there will other situation/scenario where this problem exist in Tez
code base.


** I patched the Tez-common so that TezUtils.createConfFromUserPayload
returns a JobConf instead on Configuration which solves the problem(may not
be a good solution).


On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hitesh@apache.org> wrote:

> Hi Subroto
>
> Could you provide some more context on what you are trying to do? Are you
> trying to run MR-on-Tez? or a native Tez job?
> If you could provide us with some code showing what you are trying to do,
> we can help further. There are probably some bugs in the MR compatibility
> that we may have not come across.
>
> thanks
> — Hitesh
>
>
> On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sanyalsubroto@gmail.com>
> wrote:
>
> > Hi,
> >
> > Tez has utility which created Configuration object from the payload:
> >
> > TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> > Configuration object even though the serialized byte[] can be of type
> > JobConf.
> >
> >
> > Once we get the Configuration we try to  create few object using
> > ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
> makes a
> > check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> > and accordingly invokes the "configure" method.
> >
> >
> > This behavior is not working  anymore in Tez scenario. One simple
> scenario
> > when user defines a custom "RawComparator" and makes it "JobConfigurable"
> > but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
> doesn't
> > care if the configuration could be instance of "org.apache.hadoop.mapred.
> > JobConf"
> > Please let me know if there is a problem with Tez or there exist lack of
> my
> > understanding about how objects should be created in Tez  :-)
> >
> > --
> > Cheers,
> > *Subroto Sanyal*
> >
>



-- 
Cheers,
*Subroto Sanyal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message