gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Workings of Hadoop Shims
Date Sun, 22 Feb 2015 23:52:49 GMT
Hi Folks,
I'm kicking off this overdue thread to obtain good understanding of exactly
whats going on with the Hadoop Shims. The documentation is lacking at the
moment and I am therefore putting time in to rectifying this.
My humble beginnings are in progress below

Scenario - Upgrade Nutch 2.3.1-SNAPSHOT to Gora 0.6
Jira Issue - https://issues.apache.org/jira/browse/NUTCH-1946
Observations - From my initial analysis of the current state of the Shims,
here are some initial observations

   - gora-shims-distribution relies upon gora-shims-hadoop,
   gora-shims-hadoop1 and gora-shims-hadoop2
   - gora-shims-hadoop provides a parent for gora-shims-hadoop1 and
   gora-shims-hadoop2, however it also had direct dependencies upon the
   - org.apache.hadoop:hadoop-client:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-hdfs:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-mapreduce-client-app:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-yarn-api:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-mapreduce-client-core:jar:2.5.2:compile
      - org.apache.hadoop:hadoop-annotations:jar:2.5.2:compile

   - As stated above, both gora-shims-hadoop1 and gora-shims-hadoop2 depend
   upon gora-shims-hadoop with the difference being that gora-shims-hadoop1
   then defines hadoop 1.X dependencies.

Problems - I understand that we have upgraded to Hadoop 2.5.2 by default.
This is great. What I am failing to get a grasp on however is exactly how
we provide guidance on upgrade to Gora 0.6 without upgrades from Hadoop
1.2.X --> 2.5.X?

Bearing in mind that gora-core depends upon gora-shims-hadoop therefore
Hadoop 2.5.2 dependencies are automatically fetched in a transitive fashion
whenever we with to upgrade gora-core dependency from 0.5 --> 0.6.

I am going to experiment with using a bunch of exclusions in my pom.xml
under the gora-shims-hadoop dependency e.g exclude all above Hadoop
dependencies, then explicitly add the gora-shims-hadoop1 dependency.

What is making this worse, is that I cannot create profiles for this
upgrade as I would be able to do in a Maven project because I am working
with Ant + Ivy.

Any thoughts would be very much appreciated. Essentially whatever we
discuss here is creation the foundation for the Gora Shims documentation so
it would be very much appreciated.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message