tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris K Wensel <ch...@wensel.net>
Subject Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies
Date Thu, 26 Feb 2015 21:18:27 GMT
The immediate issue is having two mutually exclusive artifacts: tez-yarn-timeline-history and

outside of ATSHistoryACLPolicyManager, the code is identical. just the dependencies are changed.

TezClient attempts to load this Manager, under the assumption if it exists, it is running
on hadoop 2.6. (running on 2.4 is fatal)

My recommendation would be never to change artifact names (or conditionally choose them) inside
of major releases, but accreting new, optional, ones as versions progress is fine.

thus I would either:

create a single artifact tez-yarn-timeline-history compiled with a default dep of hadoop 2.6,
that includes the Manager. update the TezClient code to gracefully fail if the Manager is
not applicable (the runtime env is Hadoop 2.4).


offer tez-yarn-timeline-history-with-acls as an optional artifact for Hadoop 2.6 deployments,
with the single Manager class in it, which in turn requires the tez-yarn-timeline-history
artifact -- which is sufficient for a 2.4 runtime. if the user provides the additional -with-acls
artifact, they are knowingly going to have problems on Hadoop 2.4.

I prefer the first as it keeps my build file simple. graceful degradation of services per
environment (with appropriate logging) is a well accepted practice.

and you can now test Tez across multiple versions Hadoop/Yarn at runtime (outside of compile

we do this with Cascading, just simple build file modifications to verify binary compatibility
(vendors fork this repo to verify their distributions, and been known to find critical bugs):



> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hitesh@apache.org> wrote:
> Hi folks, 
> Chris raised a good point earlier in terms of publishing jars for use against different
versions of hadoop. For the most part, I think we have done well to ensure that the user-facing
modules are version agnostic but the same does not hold for other modules which are times
are needed by other applications for testing.
> There aren’t really too many different options we can try.  The simplest option I can
think of is just to build tez against different versions of hadoop with the tez.version set
to something along the lines of “tez.version-hadoop.version”. This would imply having
tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability point of view, depending
on the option we pick, users will need to switch their dependencies to point to an appropriate
version based on what version of hadoop they are using. For apps such as hive and pig, they
will need to manage picking a particular version of tez based on which hadoop profile they
are building against. 
> Any other suggestions for publishing version dependent jars?
> For binary releases, should we release only the minimal tarball? or both the minimal
and full tar balls? The full tarball is the recommended deployment model as it is more robust
towards compatibility on a changing cluster. It should work in most scenarios as long as the
hadoop client libraries that Tez depends on are compatible with the servers running on the
> General questions for the community/past release managers: 
>   - Should we retain the simple version ( i.e. plain only x.y.z ) when building against
the default version of hadoop as determined by Tez? This “default.version” will have a
tendency to evolve over time :) . These simple version jars would be in addition to the version
specific jars. 
>   - What versions of hadoop should we compile against? 2.2, 2.4 and 2.6 or 2.2,2.3,2.4,2.5,2.6
? Please note that I am ignoring the minor version so we should pick the latest version in
each line i.e. 2.2.1 over 2.2.0 if 2.2.1 exists. 
> Any other comments? 
> thanks
> — Hitesh

Chris K Wensel

View raw message