mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: running Dirichlet example on AEMR
Date Mon, 18 May 2009 11:02:25 GMT
I don't know much about AEMR, so, tell me more about the Ruby CLI  
stuff?  Does that factor in?


On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:

> Hi,
>
> I am still trying to make this work. I am running AEMR with the latest
> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main- 
> class
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
> s3n://myBucket/mahout-input/synthetic-control.data --arg
> s3n://myBucket/mahout-output/dirichlet --arg
> org 
> .apache 
> .mahout 
> .clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
> --arg 10 --arg 5 --arg 1.0 --arg 1
>
> This gave me the class not found error mentioned in my previous email.
>
> I have tried the following: I moved the DirichletJob class from the  
> core
> project into the exampes project, putting it in
> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The  
> rationale for
> doing that is that in this way, the classloader does not need to  
> look into
> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class;  
> instead it
> finds it directly alongside Job.class.
>
> This got me one step further, but an error of the same type stops me  
> again:
>
> java.lang.ClassNotFoundException:
> org 
> .apache 
> .mahout 
> .clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>    at
> org 
> .apache 
> .mahout 
> .clustering 
> .dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>    at
> org 
> .apache 
> .mahout 
> .clustering 
> .dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>    ... 8 more
>
> This happens on a .loadClass() from the current thread's classloader.
>
> I have tried running this example on my local single-node Hadoop
> installation: this runs fine. The error above occurs only with Amazon
> Elastic MapReduce, and definitely seems related to classloading  
> issues.
>
> Any ideas ?
>
> Thanks
> Sebastien
>
> 2009/5/15 Sebastien Bratieres <sb358@cam.ac.uk>
>
>> Hi,
>>
>> Thanks Grant, that did it. I'll figure out later what's going on.
>>
>> Now I'm able to run the kMeans example on Amazon EMR as Stephen  
>> did. I want
>> to run the Dirichlet example, which I launch with
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the  
>> main
>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>
>> This fails with
>> java.lang.NoClassDefFoundError:
>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>    at
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>    at
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>    at
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>    at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> DirichletJob is located in the .job file, inside
>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader  
>> can't find
>> it.
>>
>> One difference between kMeans and Dirichlet is
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>    JobConf conf = new JobConf(Job.class);
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>    JobConf conf = new JobConf(DirichletJob.class);
>> ie the Dirichlet version uses a job class which is in core, while the
>> kMeans version uses the currently executing Job class from  
>> examples. Is
>> there an issue with this ?
>>
>> What should I do to work around this error ? Is the MANIFEST.MF  
>> file of the
>> .job contain a pointer to the /lib directory for the jars there to be
>> visible by the jar classloader ?
>>
>> Thanks
>> Sebastien
>>
>>
>> 2009/5/14 Grant Ingersoll <gsingers@apache.org>
>>
>>> Try running mvn install from the top level dir first.
>>>
>>>
>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>
>>> Hi,
>>>>
>>>> I'd like to walk in the footsteps of Stephen Green running Mahout  
>>>> on EMR.
>>>>
>>>> He points out that the fix to issue 118 is needed to do that (I  
>>>> first
>>>> ran into the file system error too). I'm a first-time Maven user  
>>>> and I
>>>> don't know how to rebuild the mahout-examples-1.0.job file once I  
>>>> have
>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>> - highlight mahout-examples project
>>>> - right-click Run As / Maven package (though I'm not sure at all  
>>>> that
>>>> Maven package is the right option to use!)
>>>>
>>>> but that gives me this error
>>>> ---
>>>> [INFO] Scanning for projects...
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] Building Mahout examples
>>>> [INFO]
>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>> [INFO] task-segment: [package]
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] [resources:resources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 0 resource
>>>> [INFO] [resources:copy-resources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 3 resources
>>>> [INFO] [compiler:compile]
>>>> [INFO] Nothing to compile - all classes are up to date
>>>> [INFO] [resources:testResources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 3 resources
>>>> [ERROR]
>>>>
>>>> Transitive dependency resolution for scope: test has failed for  
>>>> your
>>>> project.
>>>>
>>>>
>>>>
>>>> Error message: Missing:
>>>> ----------
>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>
>>>> Try downloading the file manually from the project website.
>>>>
>>>> Then, install it using the command:
>>>>    mvn install:install-file -DgroupId=org.apache.mahout
>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>
>>>> Alternatively, if you host your own repository you can deploy the  
>>>> file
>>>> there:
>>>>    mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>> -DrepositoryId=[id]
>>>>
>>>> Path to dependency:
>>>>      1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>      2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>
>>>> ----------
>>>> 1 required artifact is missing.
>>>>
>>>> for artifact:
>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>
>>>> from the specified remote repositories:
>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository 
>>>> ),
>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>> central (http://repo1.maven.org/maven2)
>>>>
>>>> Group-Id: org.apache.mahout
>>>> Artifact-Id: mahout-examples
>>>> Version: 0.2-SNAPSHOT
>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>
>>>>
>>>>
>>>>
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] For more information, run with the -e flag
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] BUILD FAILED
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] Total time: 6 seconds
>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>> [INFO] Final Memory: 3M/22M
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>>
>>>> ---
>>>>
>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>> equivalent that contains the patch for 118 and will run on EMR.  
>>>> What
>>>> is the right way to do this ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message