mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastien Bratieres <sb...@cam.ac.uk>
Subject Re: running Dirichlet example on AEMR
Date Fri, 15 May 2009 21:03:17 GMT
Hi,

I am still trying to make this work. I am running AEMR with the latest
mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
s3n://myBucket/mahout-input/synthetic-control.data --arg
s3n://myBucket/mahout-output/dirichlet --arg
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
--arg 10 --arg 5 --arg 1.0 --arg 1

This gave me the class not found error mentioned in my previous email.

I have tried the following: I moved the DirichletJob class from the core
project into the exampes project, putting it in
org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
doing that is that in this way, the classloader does not need to look into
lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
finds it directly alongside Job.class.

This got me one step further, but an error of the same type stops me again:

java.lang.ClassNotFoundException:
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
    at
org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
    at
org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
    ... 8 more

This happens on a .loadClass() from the current thread's classloader.

I have tried running this example on my local single-node Hadoop
installation: this runs fine. The error above occurs only with Amazon
Elastic MapReduce, and definitely seems related to classloading issues.

Any ideas ?

Thanks
Sebastien

2009/5/15 Sebastien Bratieres <sb358@cam.ac.uk>

> Hi,
>
> Thanks Grant, that did it. I'll figure out later what's going on.
>
> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I want
> to run the Dirichlet example, which I launch with
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
> class from the mahout-examples-0.2-SNAPSHOT.job.
>
> This fails with
> java.lang.NoClassDefFoundError:
> org/apache/mahout/clustering/dirichlet/DirichletJob
>     at
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>     at
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> DirichletJob is located in the .job file, inside
> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't find
> it.
>
> One difference between kMeans and Dirichlet is
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>     JobConf conf = new JobConf(Job.class);
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>     JobConf conf = new JobConf(DirichletJob.class);
> ie the Dirichlet version uses a job class which is in core, while the
> kMeans version uses the currently executing Job class from examples. Is
> there an issue with this ?
>
> What should I do to work around this error ? Is the MANIFEST.MF file of the
> .job contain a pointer to the /lib directory for the jars there to be
> visible by the jar classloader ?
>
> Thanks
> Sebastien
>
>
> 2009/5/14 Grant Ingersoll <gsingers@apache.org>
>
>> Try running mvn install from the top level dir first.
>>
>>
>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>
>>  Hi,
>>>
>>> I'd like to walk in the footsteps of Stephen Green running Mahout on EMR.
>>>
>>> He points out that the fix to issue 118 is needed to do that (I first
>>> ran into the file system error too). I'm a first-time Maven user and I
>>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>> - highlight mahout-examples project
>>> - right-click Run As / Maven package (though I'm not sure at all that
>>> Maven package is the right option to use!)
>>>
>>> but that gives me this error
>>> ---
>>> [INFO] Scanning for projects...
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Building Mahout examples
>>> [INFO]
>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>> [INFO] task-segment: [package]
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] [resources:resources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 0 resource
>>> [INFO] [resources:copy-resources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 3 resources
>>> [INFO] [compiler:compile]
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO] [resources:testResources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 3 resources
>>> [ERROR]
>>>
>>> Transitive dependency resolution for scope: test has failed for your
>>> project.
>>>
>>>
>>>
>>> Error message: Missing:
>>> ----------
>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>
>>>  Try downloading the file manually from the project website.
>>>
>>>  Then, install it using the command:
>>>     mvn install:install-file -DgroupId=org.apache.mahout
>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>
>>>  Alternatively, if you host your own repository you can deploy the file
>>> there:
>>>     mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>> -DrepositoryId=[id]
>>>
>>>  Path to dependency:
>>>       1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>       2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>
>>> ----------
>>> 1 required artifact is missing.
>>>
>>> for artifact:
>>>  org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>
>>> from the specified remote repositories:
>>>  Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>  maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>  central (http://repo1.maven.org/maven2)
>>>
>>> Group-Id: org.apache.mahout
>>> Artifact-Id: mahout-examples
>>> Version: 0.2-SNAPSHOT
>>> From file: C:\workspace\mahout\examples\pom.xml
>>>
>>>
>>>
>>>
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] For more information, run with the -e flag
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] BUILD FAILED
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Total time: 6 seconds
>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>> [INFO] Final Memory: 3M/22M
>>> [INFO]
>>> ------------------------------------------------------------------------
>>>
>>> ---
>>>
>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>> equivalent that contains the patch for 118 and will run on EMR. What
>>> is the right way to do this ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message