Hey Matthias,
I cc'd everyone else on here, but since this was your module, I thought it
best to solicit your opinion before refactoring it.
We never managed to get crunch-archetypes working w/hadoop 2.x, which is
apparently deprecating the lib/* trick for including client dependencies in
favor of the -libjars option (see
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/and
http://architects.dzone.com/articles/using-libjars-option-hadoop )
The way that I have found to do this in Maven is to use the
copy-dependencies option of the maven-dependency-plugin and include a shell
script in a bin/ directory that knows how to setup the HADOOP_CLASSPATH and
libjars arguments for use with hadoop jar. Although this approach is more
complex than the lib/* trick, it will be able to support hadoop 1.x as well
as hadoop 2.x.
Do you have any objections to me taking this on, and/or any other landmines
I should keep an eye out for?
Thanks!
Josh
--
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>
|