tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Kelpe <ake...@concurrentinc.com>
Subject Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies
Date Wed, 11 Mar 2015 11:50:12 GMT
Not everybody is ready to move to a new Hadoop version every so often. As Chris already mentioned
it is a good idea to keep artifact names stable and detect features at runtime. We are doing
that in Cascading as well: We compile it against one version of Hadoop, but do everything
we can to keep it compatible with older and newer releases (currently 9 releases): https://github.com/Cascading/cascading.compatibility.
This is more work for us as an upstream, but makes the live of our users a lot easier.  Note
that we do not publish a release per version, we ensure that the one release is binary compatible.

I believe Tez should provide a binary release that is tested and compatible with multiple
versions of hadoop, instead of “compile your own”. While I understand that the ASF only
demands source releases, I believe having binary releases, which are compatible with multiple
versions of hadoop, will help with adoption, since it removes friction downstream.

- André



> On 08 Mar 2015, at 22:54, Bikas Saha <bikas@hortonworks.com> wrote:
> 
> As an aside, Flink could consider moving to a more current version. There have been many
key improvements in Timeline Server, preemption, node labels, resource monitoring etc. that
users may want to take advantage of.
> 
> If Tez publishes Hadoop version specific binaries to maven then Flink and others may
be able to consume them directly during development.
> 
> Bikas
> 
> -----Original Message-----
> From: Robert Metzger [mailto:rmetzger@apache.org] 
> Sent: Sunday, March 08, 2015 6:40 AM
> To: dev@tez.apache.org
> Subject: Re: [DISCUSS] Publishing and releasing jars for different hadoop version dependencies
> 
> Hi Hitesh,
> 
> I've talked about this with Kostas, let me check on some of our assumptions.
> 
> You can compile Flink against a hadoop1 and hadoop2 profile. We would include flink-on-tez
only into our (default) hadoop2 profile.
> For that profile, we use Hadoop 2.2.0.
> 
> You can see on maven central, that we publish two versions of each flink module for each
release, a 0.8.1-hadoop1 and a 0.8.1 version.
> This way users from both Hadoop APIs can use our system.
> 
> Adding Tez as a dependency to Flink (hadoop2) would cause a dependency conflict on the
Hadoop version. Our parent pom enforces Hadoop 2.2.0 for all dependencies, so we force Tez
to use Hadoop 2.2.0 as well.
> In my understanding the compilation fails in that case.
> 
> If there would be a Tez version compatible with Hadoop 2.2.0 in mvn central, we could
add the "flink-on-tez" module to maven central.
> 
> If thats not possible, users who want to use Flink-on-Tez have to compile Flink against
Hadoop 2.6.0 themselves. Its only one maven command, but less convenient than something on
mvn central.
> 
> 
> On Fri, Mar 6, 2015 at 8:03 PM, Hitesh Shah <hitesh@apache.org> wrote:
> 
>> Thanks for the feedback, Kostas,
>> 
>> One clarification though - are you saying Tez should publish jars to 
>> maven central built against different versions of Hadoop? If yes, is 
>> this mainly due to the hadoop dependencies that Tez pulls in or due to 
>> any incompatibilities that you have noticed?
>> 
>> thanks
>> — Hitesh
>> 
>> 
>> On Mar 6, 2015, at 9:03 AM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
>> 
>>> Publishing jars for different Hadoop dependencies, and in particular 
>>> for Hadoop 2.2 would also be beneficial for Flink on Tez as we offer 
>>> maven archetypes for users to create Flink applications.  Currently, 
>>> we need to ask users that want to run Flink apps with Tez as backend 
>>> to compile the Flink code themselves due to a Hadoop version mismatch.
>>> 
>>> 
>>> 
>>> On Thu, Mar 5, 2015 at 1:46 AM, Hitesh Shah <hitesh@apache.org> wrote:
>>> 
>>>> From an ASF perspective, verifiable releases are only source releases.
>> The
>>>> binaries are just convenience artifacts that can also made 
>>>> available
>> with a
>>>> given release. Hence in terms of supporting multiple hadoop 
>>>> versions,
>> we do
>>>> want to allow various users/distros to compile Tez against their
>> particular
>>>> version of hadoop.
>>>> 
>>>> From a run-time point of view , if Tez compiled against hadoop-2.6 
>>>> is
>> run
>>>> on a 2.4 cluster, it should work normally as long as acls are 
>>>> disabled ( via tez config tez.am.acls.enabled ). That said, there 
>>>> are probably some improvements that could be done to handle the 
>>>> case where acls are
>> enabled
>>>> on a 2.4 cluster in a more cleaner manner.
>>>> 
>>>> thanks
>>>> — Hitesh
>>>> 
>>>> On Mar 4, 2015, at 9:21 AM, Chris K Wensel <chris@wensel.net> wrote:
>>>> 
>>>>> compile what against hadoop 2.4? Tez? Hopefully no one except Tez 
>>>>> devs
>>>> ever compile Tez (once the apache committers offer up pre-built
>> binaries, I
>>>> only ever do for this reason).
>>>>> 
>>>>> if compiling application code against Tez and Hadoop 2.4, the jar 
>>>>> won't
>>>> come into play unless running tests (so i believe).
>>>>> 
>>>>> I would then enhance option two to gracefully fail if -acls (the
>>>> Manager) is not applicable (on hadoop 2.4) but mistakenly included 
>>>> in
>> the
>>>> 2.4 classpath (testing app code against hadoop 2.4)
>>>>> 
>>>>> of course then this is really option 1 now with two jars.
>>>>> 
>>>>> ckw
>>>>> 
>>>>>> On Mar 2, 2015, at 3:05 PM, Hitesh Shah <hitesh@apache.org>
wrote:
>>>>>> 
>>>>>> Thanks for the suggestions, Chris. Filed TEZ-2168 for this.
>>>>>> 
>>>>>> At this point, I am inclined to follow option 2 mainly to retain

>>>>>> the
>>>> ability for users to compile against hadoop 2.4. I am not sure if 
>>>> there
>> is
>>>> a simple and performant way ( without using reflection for all 2.6
>> specific
>>>> calls ) to retain compile compatibility with option 1.
>>>>>> 
>>>>>> Any other comments for other folks on this issue in general or on

>>>>>> the
>> 2
>>>> options that Chris suggested?
>>>>>> 
>>>>>> thanks
>>>>>> — Hitesh
>>>>>> 
>>>>>> 
>>>>>> On Feb 26, 2015, at 1:18 PM, Chris K Wensel <chris@wensel.net>
wrote:
>>>>>> 
>>>>>>> The immediate issue is having two mutually exclusive artifacts:
>>>> tez-yarn-timeline-history and tez-yarn-timeline-history
>>>>>>> 
>>>>>>> outside of ATSHistoryACLPolicyManager, the code is identical.

>>>>>>> just
>> the
>>>> dependencies are changed.
>>>>>>> 
>>>>>>> TezClient attempts to load this Manager, under the assumption
if 
>>>>>>> it
>>>> exists, it is running on hadoop 2.6. (running on 2.4 is fatal)
>>>>>>> 
>>>>>>> My recommendation would be never to change artifact names (or
>>>> conditionally choose them) inside of major releases, but accreting 
>>>> new, optional, ones as versions progress is fine.
>>>>>>> 
>>>>>>> thus I would either:
>>>>>>> 
>>>>>>> create a single artifact tez-yarn-timeline-history compiled with

>>>>>>> a
>>>> default dep of hadoop 2.6, that includes the Manager. update the
>> TezClient
>>>> code to gracefully fail if the Manager is not applicable (the 
>>>> runtime
>> env
>>>> is Hadoop 2.4).
>>>>>>> 
>>>>>>> or
>>>>>>> 
>>>>>>> offer tez-yarn-timeline-history-with-acls as an optional 
>>>>>>> artifact for
>>>> Hadoop 2.6 deployments, with the single Manager class in it, which 
>>>> in
>> turn
>>>> requires the tez-yarn-timeline-history artifact -- which is 
>>>> sufficient
>> for
>>>> a 2.4 runtime. if the user provides the additional -with-acls 
>>>> artifact, they are knowingly going to have problems on Hadoop 2.4.
>>>>>>> 
>>>>>>> I prefer the first as it keeps my build file simple. graceful
>>>> degradation of services per environment (with appropriate logging) 
>>>> is a well accepted practice.
>>>>>>> 
>>>>>>> and you can now test Tez across multiple versions Hadoop/Yarn
at
>>>> runtime (outside of compile time).
>>>>>>> 
>>>>>>> we do this with Cascading, just simple build file modifications

>>>>>>> to
>>>> verify binary compatibility (vendors fork this repo to verify their 
>>>> distributions, and been known to find critical bugs):
>>>>>>> 
>>>>>>> https://github.com/Cascading/cascading.compatibility
>>>>>>> 
>>>>>>> ckw
>>>>>>> 
>>>>>>>> On Feb 26, 2015, at 11:03 AM, Hitesh Shah <hitesh@apache.org>
>> wrote:
>>>>>>>> 
>>>>>>>> Hi folks,
>>>>>>>> 
>>>>>>>> Chris raised a good point earlier in terms of publishing
jars 
>>>>>>>> for
>> use
>>>> against different versions of hadoop. For the most part, I think we 
>>>> have done well to ensure that the user-facing modules are version 
>>>> agnostic
>> but
>>>> the same does not hold for other modules which are times are needed 
>>>> by other applications for testing.
>>>>>>>> 
>>>>>>>> There aren’t really too many different options we can try.
 The
>>>> simplest option I can think of is just to build tez against 
>>>> different versions of hadoop with the tez.version set to something 
>>>> along the
>> lines of
>>>> “tez.version-hadoop.version”. This would imply having
>>>> tez-api-0.6.0-hadoop2.4 or tez-api-0.6.0-hadoop26. For a usability
>> point of
>>>> view, depending on the option we pick, users will need to switch 
>>>> their dependencies to point to an appropriate version based on what 
>>>> version of hadoop they are using. For apps such as hive and pig, 
>>>> they will need to manage picking a particular version of tez based 
>>>> on which hadoop profile they are building against.
>>>>>>>> 
>>>>>>>> Any other suggestions for publishing version dependent jars?
>>>>>>>> 
>>>>>>>> For binary releases, should we release only the minimal 
>>>>>>>> tarball? or
>>>> both the minimal and full tar balls? The full tarball is the 
>>>> recommended deployment model as it is more robust towards 
>>>> compatibility on a
>> changing
>>>> cluster. It should work in most scenarios as long as the hadoop 
>>>> client libraries that Tez depends on are compatible with the 
>>>> servers running on the cluster.
>>>>>>>> 
>>>>>>>> General questions for the community/past release managers:
>>>>>>>> - Should we retain the simple version ( i.e. plain only x.y.z
) 
>>>>>>>> when
>>>> building against the default version of hadoop as determined by Tez?
>> This
>>>> “default.version” will have a tendency to evolve over time :) . 
>>>> These simple version jars would be in addition to the version specific jars.
>>>>>>>> - What versions of hadoop should we compile against? 2.2,
2.4 
>>>>>>>> and
>> 2.6
>>>> or 2.2,2.3,2.4,2.5,2.6 ? Please note that I am ignoring the minor
>> version
>>>> so we should pick the latest version in each line i.e. 2.2.1 over 
>>>> 2.2.0
>> if
>>>> 2.2.1 exists.
>>>>>>>> 
>>>>>>>> Any other comments?
>>>>>>>> 
>>>>>>>> thanks
>>>>>>>> — Hitesh
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> —
>>>>>>> Chris K Wensel
>>>>>>> chris@wensel.net
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> —
>>>>> Chris K Wensel
>>>>> chris@wensel.net
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 

--
André Kelpe
andre@concurrentinc.com
http://concurrentinc.com





Mime
View raw message