mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Farris <>
Subject Re: Classpath question
Date Sat, 11 Sep 2010 03:16:17 GMT
Hi Mark,

I've found that jobs loaded as classes contained in jar files within
the lib directory of a job jar have issues loading classes from jars
also in the lib directory. For example, I create a job jar that
includes all of the mahout (and dependency) jars in lib and execute
org.apache.mahout.classifier.bayes.TrainClassifier (from mahout-core
jar) and run into ClassNotFound exceptions for TokenStream, which
happens to be contained within the lucene jars in the job jar's lib

(command-line: hadoop jar target/mahout-drew-1.0-SNAPSHOT-job.jar
org.apache.mahout.classifier.bayes.TrainClassifier ..related args..)

I've work around the problem by unrolling the mahout-core jar and
placing the those classes at the top level of my hand-rolled job jar.
You'll see that the mahout-examples-VERSION.job provided with Mahout
does the same. It is generally easier to use the mahout-examples job
file or mahout command-line utility to run Mahout jobs.

In your example, I also noticed you include your classes within a
subdirectory named 'src' which may also cause problems unless that
indeed is the name of your top level package directory. You can see
that the mahout-examples-VERSION.job file includes the class package
tree at the top level of the job file.

Although your specific problem doesn't seem to be related to 0.3, you
should consider checking out the mahout sources from trunk and build
from there. There have been a number of fixes and improvements since
0.3 - See:
for instructions,



On Fri, Sep 10, 2010 at 9:44 PM, Mark <> wrote:
>  Perhaps this a better place to post this (Originally posted to Hadoop)
> If I submit a jar that has a lib directory that contains a bunch of jars,
> shouldn't those jars be in the classpath and available to all nodes?
> The reason I ask this is because I am trying to submit a jar myjar.jar that
> has the following structure
> --src
>  \.... (My source classes)
> -- lib
>  \
>   -- mahout-collections-0.3.jar
>   -- mahout-core-0.3.jar
>   -- mahout-math-0.3.jar
>   -- hbase-0.20.0.jar
>   -- commons-cli-2.0-mahout.jar
> Now the job I am trying to run is actually part of mahout-core-0.3... not
> src. The job starts but then fails with the following error
> 10/09/10 03:15:13 INFO mapred.JobClient: Task Id :
> attempt_201009100306_0003_r_000000_0, Status : FAILED
> java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(
>    at org.apache.hadoop.mapred.ReduceTask.runNewReducer(
>    at
>    at org.apache.hadoop.mapred.Child.main(
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>    at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(
>    at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
>    at java.lang.reflect.Constructor.newInstance(
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(
>    ... 3 more
> Caused by: java.lang.NoClassDefFoundError:
> org/apache/mahout/math/map/OpenObjectIntHashMap
>    at
> org.apache.mahout.fpm.pfpgrowth.ParallelFPGrowthReducer.<init>(
>    ... 8 more
> Apparently it cant find the
> org/apache/mahout/math/map/OpenObjectIntHashMap.class although that class is
> definitely in mahout-collections-0.3.jar. Is this a problem because the
> mahout-core-0.3.jar doesn't have a lib directory?
> What is an easy way around this?
> Thanks

View raw message