mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: running Dirichlet example on AEMR
Date Mon, 18 May 2009 19:23:39 GMT
Hi Sebastian,

For some reason this was the first post I've seen on this topic. There 
is something wrong with the Dirichlet jar layout that makes the 
classloader throw a CNF exception. I noticed this when we were proofing 
the release and we discussed it on this list without resolution:

java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
    at 
org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:97)
    at 
org.apache.mahout.clustering.dirichlet.DirichletMapper.configure(DirichletMapper.java:61)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)

Is this the same exception you saw before moving the DirichletJob?

I think the problem is that the classloader for the DirichletMapper and 
other classes, located in the lib, cannot find the
NormalScModel Distribution, located in the jar. We were seeing a 
slightly different manifestation earlier, dunno why.

I think trying to use a custom distance measure with kmeans would have a 
similar result. Moving the Job only postponed the problem to the Mapper.

Jeff

Sebastien Bratieres wrote:
> Hi Grant,
>
> It doesn't look like the CLI has anything to do with my issue -- it's just a
> command-line interface to drive the Amazon machines and jobs you run there
> remotely. It sends HTTP requests to Amazon to switch machines on and off,
> start jobs etc. My issue is linked to the AEMR setup or to something
> peculiar with classloading and the Dirichlet sample (that's because the
> kMeans example runs fine).
> If the kind of issue I'm seeing doesn't ring a bell with you Mahout guys, I
> think I'll try with AEMR staff.
>
> Thanks
> Sebastien
>
> 2009/5/18 Grant Ingersoll <gsingers@apache.org>
>
>   
>> I don't know much about AEMR, so, tell me more about the Ruby CLI stuff?
>>  Does that factor in?
>>
>>
>>
>> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>>
>>  Hi,
>>     
>>> I am still trying to make this work. I am running AEMR with the latest
>>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>>> s3n://myBucket/mahout-output/dirichlet --arg
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>>
>>> This gave me the class not found error mentioned in my previous email.
>>>
>>> I have tried the following: I moved the DirichletJob class from the core
>>> project into the exampes project, putting it in
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
>>> doing that is that in this way, the classloader does not need to look into
>>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
>>> finds it directly alongside Job.class.
>>>
>>> This got me one step further, but an error of the same type stops me
>>> again:
>>>
>>> java.lang.ClassNotFoundException:
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>   at
>>>
>>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>>>   at
>>>
>>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>>>   ... 8 more
>>>
>>> This happens on a .loadClass() from the current thread's classloader.
>>>
>>> I have tried running this example on my local single-node Hadoop
>>> installation: this runs fine. The error above occurs only with Amazon
>>> Elastic MapReduce, and definitely seems related to classloading issues.
>>>
>>> Any ideas ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>> 2009/5/15 Sebastien Bratieres <sb358@cam.ac.uk>
>>>
>>>  Hi,
>>>       
>>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>>
>>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I
>>>> want
>>>> to run the Dirichlet example, which I launch with
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
>>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>>
>>>> This fails with
>>>> java.lang.NoClassDefFoundError:
>>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>   at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>   at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>
>>>> DirichletJob is located in the .job file, inside
>>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't
>>>> find
>>>> it.
>>>>
>>>> One difference between kMeans and Dirichlet is
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>>   JobConf conf = new JobConf(Job.class);
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>>   JobConf conf = new JobConf(DirichletJob.class);
>>>> ie the Dirichlet version uses a job class which is in core, while the
>>>> kMeans version uses the currently executing Job class from examples. Is
>>>> there an issue with this ?
>>>>
>>>> What should I do to work around this error ? Is the MANIFEST.MF file of
>>>> the
>>>> .job contain a pointer to the /lib directory for the jars there to be
>>>> visible by the jar classloader ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>>
>>>> 2009/5/14 Grant Ingersoll <gsingers@apache.org>
>>>>
>>>>  Try running mvn install from the top level dir first.
>>>>         
>>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>           
>>>>>> I'd like to walk in the footsteps of Stephen Green running Mahout
on
>>>>>> EMR.
>>>>>>
>>>>>> He points out that the fix to issue 118 is needed to do that (I first
>>>>>> ran into the file system error too). I'm a first-time Maven user
and I
>>>>>> don't know how to rebuild the mahout-examples-1.0.job file once I
have
>>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>>> - highlight mahout-examples project
>>>>>> - right-click Run As / Maven package (though I'm not sure at all
that
>>>>>> Maven package is the right option to use!)
>>>>>>
>>>>>> but that gives me this error
>>>>>> ---
>>>>>> [INFO] Scanning for projects...
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] Building Mahout examples
>>>>>> [INFO]
>>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>> [INFO] task-segment: [package]
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] [resources:resources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 0 resource
>>>>>> [INFO] [resources:copy-resources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 3 resources
>>>>>> [INFO] [compiler:compile]
>>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>>> [INFO] [resources:testResources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 3 resources
>>>>>> [ERROR]
>>>>>>
>>>>>> Transitive dependency resolution for scope: test has failed for your
>>>>>> project.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error message: Missing:
>>>>>> ----------
>>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>
>>>>>> Try downloading the file manually from the project website.
>>>>>>
>>>>>> Then, install it using the command:
>>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>>
>>>>>> Alternatively, if you host your own repository you can deploy the
file
>>>>>> there:
>>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>>> -DrepositoryId=[id]
>>>>>>
>>>>>> Path to dependency:
>>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>
>>>>>> ----------
>>>>>> 1 required artifact is missing.
>>>>>>
>>>>>> for artifact:
>>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>
>>>>>> from the specified remote repositories:
>>>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>>> central (http://repo1.maven.org/maven2)
>>>>>>
>>>>>> Group-Id: org.apache.mahout
>>>>>> Artifact-Id: mahout-examples
>>>>>> Version: 0.2-SNAPSHOT
>>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] For more information, run with the -e flag
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] BUILD FAILED
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] Total time: 6 seconds
>>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>>> [INFO] Final Memory: 3M/22M
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>>> equivalent that contains the patch for 118 and will run on EMR. What
>>>>>> is the right way to do this ?
>>>>>>
>>>>>> Thanks
>>>>>> Sebastien
>>>>>>
>>>>>>
>>>>>>             
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>>
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>>> Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>>
>>>>>
>>>>>
>>>>>           
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>     
>
>   


Mime
View raw message