mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: running Dirichlet example on AEMR
Date Mon, 18 May 2009 19:57:13 GMT
Indeed, I can create the same problem in Kmeans by using my own custom 
distance measure:

java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.CustomEuclideanDistanceMeasure
    at org.apache.mahout.clustering.canopy.Canopy.configure(Canopy.java:113)
    at 
org.apache.mahout.clustering.canopy.CanopyMapper.configure(CanopyMapper.java:49)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.CustomEuclideanDistanceMeasure
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
    at org.apache.mahout.clustering.canopy.Canopy.configure(Canopy.java:109)
    ... 8 more

This indicates the classloader for the mahout jar in lib does not have 
its parent as the examples job loader. I can run both examples fine in 
Eclipse.

Jeff


Jeff Eastman wrote:
> Hi Sebastian,
>
> For some reason this was the first post I've seen on this topic. There 
> is something wrong with the Dirichlet jar layout that makes the 
> classloader throw a CNF exception. I noticed this when we were 
> proofing the release and we discussed it on this list without resolution:
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:97)

>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.configure(DirichletMapper.java:61)

>
>    at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) 
>
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>
> Is this the same exception you saw before moving the DirichletJob?
>
> I think the problem is that the classloader for the DirichletMapper 
> and other classes, located in the lib, cannot find the
> NormalScModel Distribution, located in the jar. We were seeing a 
> slightly different manifestation earlier, dunno why.
>
> I think trying to use a custom distance measure with kmeans would have 
> a similar result. Moving the Job only postponed the problem to the 
> Mapper.
>
> Jeff
>
> Sebastien Bratieres wrote:
>> Hi Grant,
>>
>> It doesn't look like the CLI has anything to do with my issue -- it's 
>> just a
>> command-line interface to drive the Amazon machines and jobs you run 
>> there
>> remotely. It sends HTTP requests to Amazon to switch machines on and 
>> off,
>> start jobs etc. My issue is linked to the AEMR setup or to something
>> peculiar with classloading and the Dirichlet sample (that's because the
>> kMeans example runs fine).
>> If the kind of issue I'm seeing doesn't ring a bell with you Mahout 
>> guys, I
>> think I'll try with AEMR staff.
>>
>> Thanks
>> Sebastien
>>
>> 2009/5/18 Grant Ingersoll <gsingers@apache.org>
>>
>>  
>>> I don't know much about AEMR, so, tell me more about the Ruby CLI 
>>> stuff?
>>>  Does that factor in?
>>>
>>>
>>>
>>> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>>>
>>>  Hi,
>>>    
>>>> I am still trying to make this work. I am running AEMR with the latest
>>>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>>>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>>>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job 
>>>> --main-class
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>>>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>>>> s3n://myBucket/mahout-output/dirichlet --arg
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution

>>>>
>>>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>>>
>>>> This gave me the class not found error mentioned in my previous email.
>>>>
>>>> I have tried the following: I moved the DirichletJob class from the 
>>>> core
>>>> project into the exampes project, putting it in
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The 
>>>> rationale for
>>>> doing that is that in this way, the classloader does not need to 
>>>> look into
>>>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; 
>>>> instead it
>>>> finds it directly alongside Job.class.
>>>>
>>>> This got me one step further, but an error of the same type stops me
>>>> again:
>>>>
>>>> java.lang.ClassNotFoundException:
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution

>>>>
>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)

>>>>
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)

>>>>
>>>>   ... 8 more
>>>>
>>>> This happens on a .loadClass() from the current thread's classloader.
>>>>
>>>> I have tried running this example on my local single-node Hadoop
>>>> installation: this runs fine. The error above occurs only with Amazon
>>>> Elastic MapReduce, and definitely seems related to classloading 
>>>> issues.
>>>>
>>>> Any ideas ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>> 2009/5/15 Sebastien Bratieres <sb358@cam.ac.uk>
>>>>
>>>>  Hi,
>>>>      
>>>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>>>
>>>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen 
>>>>> did. I
>>>>> want
>>>>> to run the Dirichlet example, which I launch with
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the 
>>>>> main
>>>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>>>
>>>>> This fails with
>>>>> java.lang.NoClassDefFoundError:
>>>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)

>>>>>
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)

>>>>>
>>>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>   at
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

>>>>>
>>>>>   at
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

>>>>>
>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>>
>>>>> DirichletJob is located in the .job file, inside
>>>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader 
>>>>> can't
>>>>> find
>>>>> it.
>>>>>
>>>>> One difference between kMeans and Dirichlet is
>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>>>   JobConf conf = new JobConf(Job.class);
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>>>   JobConf conf = new JobConf(DirichletJob.class);
>>>>> ie the Dirichlet version uses a job class which is in core, while the
>>>>> kMeans version uses the currently executing Job class from 
>>>>> examples. Is
>>>>> there an issue with this ?
>>>>>
>>>>> What should I do to work around this error ? Is the MANIFEST.MF 
>>>>> file of
>>>>> the
>>>>> .job contain a pointer to the /lib directory for the jars there to be
>>>>> visible by the jar classloader ?
>>>>>
>>>>> Thanks
>>>>> Sebastien
>>>>>
>>>>>
>>>>> 2009/5/14 Grant Ingersoll <gsingers@apache.org>
>>>>>
>>>>>  Try running mvn install from the top level dir first.
>>>>>        
>>>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>          
>>>>>>> I'd like to walk in the footsteps of Stephen Green running 
>>>>>>> Mahout on
>>>>>>> EMR.
>>>>>>>
>>>>>>> He points out that the fix to issue 118 is needed to do that
(I 
>>>>>>> first
>>>>>>> ran into the file system error too). I'm a first-time Maven user

>>>>>>> and I
>>>>>>> don't know how to rebuild the mahout-examples-1.0.job file once

>>>>>>> I have
>>>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>>>> - highlight mahout-examples project
>>>>>>> - right-click Run As / Maven package (though I'm not sure at
all 
>>>>>>> that
>>>>>>> Maven package is the right option to use!)
>>>>>>>
>>>>>>> but that gives me this error
>>>>>>> ---
>>>>>>> [INFO] Scanning for projects...
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>> [INFO] Building Mahout examples
>>>>>>> [INFO]
>>>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>> [INFO] task-segment: [package]
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>> [INFO] [resources:resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 0 resource
>>>>>>> [INFO] [resources:copy-resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [INFO] [compiler:compile]
>>>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>>>> [INFO] [resources:testResources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [ERROR]
>>>>>>>
>>>>>>> Transitive dependency resolution for scope: test has failed for

>>>>>>> your
>>>>>>> project.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error message: Missing:
>>>>>>> ----------
>>>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> Try downloading the file manually from the project website.
>>>>>>>
>>>>>>> Then, install it using the command:
>>>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>>>
>>>>>>> Alternatively, if you host your own repository you can deploy

>>>>>>> the file
>>>>>>> there:
>>>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>>>> -DrepositoryId=[id]
>>>>>>>
>>>>>>> Path to dependency:
>>>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> ----------
>>>>>>> 1 required artifact is missing.
>>>>>>>
>>>>>>> for artifact:
>>>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>
>>>>>>> from the specified remote repositories:
>>>>>>> Apache snapshots 
>>>>>>> (http://people.apache.org/maven-snapshot-repository),
>>>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>>>> central (http://repo1.maven.org/maven2)
>>>>>>>
>>>>>>> Group-Id: org.apache.mahout
>>>>>>> Artifact-Id: mahout-examples
>>>>>>> Version: 0.2-SNAPSHOT
>>>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>> [INFO] For more information, run with the -e flag
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>> [INFO] BUILD FAILED
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>> [INFO] Total time: 6 seconds
>>>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>>>> [INFO] Final Memory: 3M/22M
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------

>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> So again, my goal is to have a new mahout-examples-1.0.job file
or
>>>>>>> equivalent that contains the patch for 118 and will run on EMR.

>>>>>>> What
>>>>>>> is the right way to do this ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sebastien
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem 
>>>>>> (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>     
>>
>>   
>
>
>


Mime
View raw message