tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subroto Sanyal <sanyalsubr...@gmail.com>
Subject Re: Tez configuration initialization ignoring JobConfigurable
Date Sat, 07 Jun 2014 04:42:05 GMT
Hi Hitesh,

Thanks for your inputs.
I would like to follow the approach mentioned in the trailing mail;
provided the code/processor implementation is done by non-Tez code.
But, how about the code which Tez provides; as I mentioned
the org.apache.tez.runtime.library.common.sort.impl.ExternalSorter.ExternalSorter(TezOutputContext,
Configuration, int, long) gets its configuration
from org.apache.tez.runtime.library.output.OnFileSortedOutput which
generates the conf using:

this.conf =
TezUtils.createConfFromUserPayload(getContext().getUserPayload());

This conf is finally used to create the comparator:

comparator = ConfigUtils.getIntermediateOutputKeyComparator(this.conf);


Please let me know how this can be fixed? Do we need to change
org.apache.tez.runtime.library.output.OnFileSortedOutput or their exist
some workaround ?


On Fri, Jun 6, 2014 at 10:58 PM, Hitesh Shah <hitesh@apache.org> wrote:

> Most of the MR compat layer code in Tez does something like the following:
>
>     byte[] userPayload = context.getUserPayload();
>     Configuration conf = TezUtils.createConfFromUserPayload(userPayload);
>     if (conf instanceof JobConf) {
>       this.jobConf = (JobConf)conf;
>     } else {
>       this.jobConf = new JobConf(conf);
>     }
>
> Some of the above should probably be fixed given that the deserialized
> payload currently cannot be an instance of JobConf but the above should
> give you an idea as to what is being done. If you look into
> ReduceProcessor, you will see the comparator being initialized
> using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
> always be passed an instance of JobConf.
>
> Let me know if you are following the above approach or if I am missing
> something which should be addressed in Tez.
>
> thanks
> — Hitesh
>
> On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sanyalsubroto@gmail.com>
> wrote:
>
> Hi Hitesh,
>
> I am trying to build and execute a DAG similar to MR but, not exactly
> MR(have custom LogicalInput/Output and Processor implementation) which
> needs intermediate sorting and shuffling (configured via Edge)
> Lets say we have RawComparator class which looks like:
>
> public class CustomRawComparator implements RawComparator, JobConfigurable
> {
>
> @Override
>
>    public void configure(JobConf conf) {
>
>      // some sort of init process
>
>       _comparator = blah blah blah
>
>    }
>
>    @Override
>
>    public int compare(Object o1, Object o2) {
>
>        return _comparator.compare(o1, o2);
>
>    }
>
>    @Override
>
>    public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
> l2) {
>
>        return _comparator.compare(b1, s1, l1, b2, s2, l2);
>
>    }
>
> }
>
>
> In my jobclient code I will write something like:
>
> jobConf.setOutputKeyComparatorClass(CustomRawComparator.class);
>
>
>
> On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
> one would expect to get an object fully configured when
>
> ReflectionUtil.newInstance(class, conf) is invoked.
>
> The above call is being used in "ExternalSorter" class but, instead of
> JobConf a Conf object is being passed.which doesn't allows the "configure"
> method of the CustomRawComparator to be invoked. "ExternalSorter" is used
> in "OnFileSortedOutput" . TezUtils provides utility to provide
> Configuration but, not JobConf.
>
> I think there will other situation/scenario where this problem exist in Tez
> code base.
>
>
> ** I patched the Tez-common so that TezUtils.createConfFromUserPayload
> returns a JobConf instead on Configuration which solves the problem(may not
> be a good solution).
>
>
> On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hitesh@apache.org> wrote:
>
> Hi Subroto
>
> Could you provide some more context on what you are trying to do? Are you
> trying to run MR-on-Tez? or a native Tez job?
> If you could provide us with some code showing what you are trying to do,
> we can help further. There are probably some bugs in the MR compatibility
> that we may have not come across.
>
> thanks
> — Hitesh
>
>
> On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sanyalsubroto@gmail.com>
> wrote:
>
> Hi,
>
> Tez has utility which created Configuration object from the payload:
>
> TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
> Configuration object even though the serialized byte[] can be of type
> JobConf.
>
>
> Once we get the Configuration we try to  create few object using
> ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance
>
> makes a
>
> check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
> and accordingly invokes the "configure" method.
>
>
> This behavior is not working  anymore in Tez scenario. One simple
>
> scenario
>
> when user defines a custom "RawComparator" and makes it "JobConfigurable"
> but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter
>
> doesn't
>
> care if the configuration could be instance of "org.apache.hadoop.mapred.
> JobConf"
> Please let me know if there is a problem with Tez or there exist lack of
>
> my
>
> understanding about how objects should be created in Tez  :-)
>
> --
> Cheers,
> *Subroto Sanyal*
>
>
>
>
>
> --
> Cheers,
> *Subroto Sanyal*
>



-- 
Cheers,
*Subroto Sanyal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message