tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@apache.org>
Subject Re: Tez configuration initialization ignoring JobConfigurable
Date Fri, 06 Jun 2014 20:58:51 GMT
Most of the MR compat layer code in Tez does something like the following:

    byte[] userPayload = context.getUserPayload();
    Configuration conf = TezUtils.createConfFromUserPayload(userPayload);
    if (conf instanceof JobConf) {
      this.jobConf = (JobConf)conf;
    } else {
      this.jobConf = new JobConf(conf);

Some of the above should probably be fixed given that the deserialized
payload currently cannot be an instance of JobConf but the above should
give you an idea as to what is being done. If you look into
ReduceProcessor, you will see the comparator being initialized
using ConfigUtils::getInputKeySecondaryGroupingComparator() and it will
always be passed an instance of JobConf.

Let me know if you are following the above approach or if I am missing
something which should be addressed in Tez.

— Hitesh

On Jun 6, 2014, at 10:37 AM, Subroto Sanyal <sanyalsubroto@gmail.com> wrote:

Hi Hitesh,

I am trying to build and execute a DAG similar to MR but, not exactly
MR(have custom LogicalInput/Output and Processor implementation) which
needs intermediate sorting and shuffling (configured via Edge)
Lets say we have RawComparator class which looks like:

public class CustomRawComparator implements RawComparator, JobConfigurable {


   public void configure(JobConf conf) {

     // some sort of init process

      _comparator = blah blah blah



   public int compare(Object o1, Object o2) {

       return _comparator.compare(o1, o2);



   public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int
l2) {

       return _comparator.compare(b1, s1, l1, b2, s2, l2);



In my jobclient code I will write something like:


On the cluster side (whatever be the framework say MRv1, MRv2 or MR on Tez)
one would expect to get an object fully configured when

ReflectionUtil.newInstance(class, conf) is invoked.

The above call is being used in "ExternalSorter" class but, instead of
JobConf a Conf object is being passed.which doesn't allows the "configure"
method of the CustomRawComparator to be invoked. "ExternalSorter" is used
in "OnFileSortedOutput" . TezUtils provides utility to provide
Configuration but, not JobConf.

I think there will other situation/scenario where this problem exist in Tez
code base.

** I patched the Tez-common so that TezUtils.createConfFromUserPayload
returns a JobConf instead on Configuration which solves the problem(may not
be a good solution).

On Fri, Jun 6, 2014 at 6:57 PM, Hitesh Shah <hitesh@apache.org> wrote:

Hi Subroto

Could you provide some more context on what you are trying to do? Are you
trying to run MR-on-Tez? or a native Tez job?
If you could provide us with some code showing what you are trying to do,
we can help further. There are probably some bugs in the MR compatibility
that we may have not come across.

— Hitesh

On Fri, Jun 6, 2014 at 6:53 AM, Subroto Sanyal <sanyalsubroto@gmail.com>


Tez has utility which created Configuration object from the payload:

TezUtils.createConfFromUserPayload(byte[] payload); this method returns a
Configuration object even though the serialized byte[] can be of type

Once we get the Configuration we try to  create few object using
ReflectionUtil.newInstance(class, conf). ReflectionUtil.newInstance

makes a

check whether the conf is instance of "org.apache.hadoop.mapred.JobConf"
and accordingly invokes the "configure" method.

This behavior is not working  anymore in Tez scenario. One simple


when user defines a custom "RawComparator" and makes it "JobConfigurable"
but, org.apache.tez.runtime.library.common.sort.impl.ExternalSorter


care if the configuration could be instance of "org.apache.hadoop.mapred.
Please let me know if there is a problem with Tez or there exist lack of


understanding about how objects should be created in Tez  :-)

*Subroto Sanyal*

*Subroto Sanyal*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message