mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Bratieres <sb...@cam.ac.uk>
Subject Re: running Dirichlet example on AEMR
Date Mon, 18 May 2009 12:12:35 GMT
Hi Grant,

It doesn't look like the CLI has anything to do with my issue -- it's just a
command-line interface to drive the Amazon machines and jobs you run there
remotely. It sends HTTP requests to Amazon to switch machines on and off,
start jobs etc. My issue is linked to the AEMR setup or to something
peculiar with classloading and the Dirichlet sample (that's because the
kMeans example runs fine).
If the kind of issue I'm seeing doesn't ring a bell with you Mahout guys, I
think I'll try with AEMR staff.

Thanks
Sebastien

2009/5/18 Grant Ingersoll <gsingers@apache.org>

> I don't know much about AEMR, so, tell me more about the Ruby CLI stuff?
>  Does that factor in?
>
>
>
> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>
>  Hi,
>>
>> I am still trying to make this work. I am running AEMR with the latest
>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>> s3n://myBucket/mahout-output/dirichlet --arg
>>
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>
>> This gave me the class not found error mentioned in my previous email.
>>
>> I have tried the following: I moved the DirichletJob class from the core
>> project into the exampes project, putting it in
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
>> doing that is that in this way, the classloader does not need to look into
>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
>> finds it directly alongside Job.class.
>>
>> This got me one step further, but an error of the same type stops me
>> again:
>>
>> java.lang.ClassNotFoundException:
>>
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>   at java.security.AccessController.doPrivileged(Native Method)
>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>   at
>>
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>>   at
>>
>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>>   ... 8 more
>>
>> This happens on a .loadClass() from the current thread's classloader.
>>
>> I have tried running this example on my local single-node Hadoop
>> installation: this runs fine. The error above occurs only with Amazon
>> Elastic MapReduce, and definitely seems related to classloading issues.
>>
>> Any ideas ?
>>
>> Thanks
>> Sebastien
>>
>> 2009/5/15 Sebastien Bratieres <sb358@cam.ac.uk>
>>
>>  Hi,
>>>
>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>
>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I
>>> want
>>> to run the Dirichlet example, which I launch with
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>
>>> This fails with
>>> java.lang.NoClassDefFoundError:
>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>   at
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>>   at
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>   at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>   at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>
>>> DirichletJob is located in the .job file, inside
>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't
>>> find
>>> it.
>>>
>>> One difference between kMeans and Dirichlet is
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>   JobConf conf = new JobConf(Job.class);
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>   JobConf conf = new JobConf(DirichletJob.class);
>>> ie the Dirichlet version uses a job class which is in core, while the
>>> kMeans version uses the currently executing Job class from examples. Is
>>> there an issue with this ?
>>>
>>> What should I do to work around this error ? Is the MANIFEST.MF file of
>>> the
>>> .job contain a pointer to the /lib directory for the jars there to be
>>> visible by the jar classloader ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>>
>>> 2009/5/14 Grant Ingersoll <gsingers@apache.org>
>>>
>>>  Try running mvn install from the top level dir first.
>>>>
>>>>
>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> I'd like to walk in the footsteps of Stephen Green running Mahout on
>>>>> EMR.
>>>>>
>>>>> He points out that the fix to issue 118 is needed to do that (I first
>>>>> ran into the file system error too). I'm a first-time Maven user and
I
>>>>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>> - highlight mahout-examples project
>>>>> - right-click Run As / Maven package (though I'm not sure at all that
>>>>> Maven package is the right option to use!)
>>>>>
>>>>> but that gives me this error
>>>>> ---
>>>>> [INFO] Scanning for projects...
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] Building Mahout examples
>>>>> [INFO]
>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>> [INFO] task-segment: [package]
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] [resources:resources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 0 resource
>>>>> [INFO] [resources:copy-resources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 3 resources
>>>>> [INFO] [compiler:compile]
>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>> [INFO] [resources:testResources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 3 resources
>>>>> [ERROR]
>>>>>
>>>>> Transitive dependency resolution for scope: test has failed for your
>>>>> project.
>>>>>
>>>>>
>>>>>
>>>>> Error message: Missing:
>>>>> ----------
>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>
>>>>> Try downloading the file manually from the project website.
>>>>>
>>>>> Then, install it using the command:
>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>
>>>>> Alternatively, if you host your own repository you can deploy the file
>>>>> there:
>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>> -DrepositoryId=[id]
>>>>>
>>>>> Path to dependency:
>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>
>>>>> ----------
>>>>> 1 required artifact is missing.
>>>>>
>>>>> for artifact:
>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>
>>>>> from the specified remote repositories:
>>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>> central (http://repo1.maven.org/maven2)
>>>>>
>>>>> Group-Id: org.apache.mahout
>>>>> Artifact-Id: mahout-examples
>>>>> Version: 0.2-SNAPSHOT
>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] For more information, run with the -e flag
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] BUILD FAILED
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] Total time: 6 seconds
>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>> [INFO] Final Memory: 3M/22M
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> ---
>>>>>
>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>> equivalent that contains the patch for 118 and will run on EMR. What
>>>>> is the right way to do this ?
>>>>>
>>>>> Thanks
>>>>> Sebastien
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message