mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: The perennial "Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector" problem
Date Mon, 09 May 2011 15:41:49 GMT
Ah, OK. The trickiness there is that we don't know the location of the
jar. (Right?) The user can tell us, though they're then specifying it
twice on the command line, once to Hadoop and once to us. At least I
don't know of something smarter.

Is there any better interim solution than just packaging it all up
into one .jar? that obviates this issue. (That doesn't personally
offend my sense of hackiness and propriety anyway, but I do see the
arguments there.) Because it looks like we need to do *something* for

And then I bet there's a better long term answer even as I don't know
what it is. Heck, if someone does know and it's not too hard, I'll
make it happen now.


On Mon, May 9, 2011 at 4:36 PM, Benson Margulies <> wrote:
> The 'lib/' convention is not a feature of Java, it's a feature of hadoop.
> It is activated by calling the 'setJar' API in the job conf, passing
> the name of the jar that contains the lib folder.
> As a convenience (and a trap for the unwary), there is a convenience:
> setJarByClass. This takes a Class<?> instead of a string jar path. It
> attempts to derive a jar name from the class reference.
> Mahout then has a series of self-contained classes that create JobConf
> objects, and make calls to setJarByClass, passing Whatever.class. If
> one of those classes somehow wanders into lib/ (like, a person
> building a job jar puts mahout into 'lib/' and then tries to use a
> Mahout job class) the call to setJarByClass is at best ineffective and
> at worst destructive.
> On Mon, May 9, 2011 at 11:07 AM, Jake Mannix <> wrote:
>> Benson,
>>  Can you remind me what the "setJarByClass" issue is again?
>> On May 9, 2011 6:30 AM, "Benson Margulies" <> wrote:
>> I see no reason to stop using the 'lib/' convention in our jobs.
>> There are apparently plenty of people out there who don't know
>> anything about the distributed cache. If we require it's use to run
>> simple jobs, we're going to be up to our ears in support email.
>> I favor the following strategy:
>> 1) Make sure that the split between 'libs/' and unpacked classes in
>> our job jars is *correct* so that all the operations of the mahout
>> command work out of the box.
>> 2) post 0.5, act on the proposed refactoring so that none of our code
>> is calling setJarFromClass in a way that forces users to do complex
>> re-shading for themselves. That's the 'bean' proposal, in which each
>> of our jobs is a bean, and a user who wants to combine ours and theirs
>> can make their own call to setJar/setJarFromClass appropriately.

View raw message