hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Reiter <a.rei...@web.de>
Subject Running MapReduce from a web application
Date Wed, 22 Jun 2011 09:04:43 GMT
Hi everybody,

it was not an easy way to run a map reduce job at all, ie if a third party jars are involved...
a good help is the article by cloudera: http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

i still can not use the -libjars argument for running a MR job with 3rd party jars, like described
in the first option
for some reason it does not work for me... the tasks fail with the java.lang.ClassNotFoundException,
classes of the 3rd party lib are not found

the second option: Include the referenced JAR in the lib subdirectory of the submittable JAR
actually this works fine for me, starting a job from the shell like this: ./bin/hadoop jar
/tmp/my.jar package.HBaseReader
not the most elegant way, but finally it works

now i would like to start MR jobs from my web application running on a tomcat, is there an
elegant way to do it using 3rd party jars?

the third option described at the article is to include the jars on every tasktracker, which
is IMHO not the very best, like the second...

the second question: at the moment i use the TextOutputFormatis the output format, which creates
a file in the specified dfs directory: part-r-00000
so i can read id using ./bin/hadoop fs -cat /tmp/requests/part-r-00000 on the shell

how can i get the path to this output file after my job is finished, to process it however...
is there another way to collect results of a MR job, a text file is good for humans, but IMHO
parsing a text file for results is not the preferable way...

thanks in advance

  - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
  - hadoop-0.20.2-CDH3B4
  - hbase-0.90.1-CDH3B4
  - zookeeper-3.3.2-CDH3B4

View raw message