mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com.INVALID>
Subject Re: SparseVectorsFromSequenceFiles StandardAnalyzer ClassNotFoundException issue
Date Tue, 03 Jun 2014 20:18:33 GMT

You r missing the Lucene jars from ur classpath. Mahout's presently at Lucene 4.6.1 that's
what u should be including.



On Tuesday, June 3, 2014 3:40 PM, Terry Blankers <terry@amritanet.com> wrote:
 


Hello, can anyone please give me a clue as to what I may be missing here?

I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner 
from a java project and I'm getting the following exception:

Error: java.lang.ClassNotFoundException: 
org.apache.lucene.analysis.standard.StandardAnalyzer
     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
     at java.security.AccessController.doPrivileged(Native Method)
     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
     at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
     at 
org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:62)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)


I've tried adding the location of lucene-analyzers-common-4.6.1.jar to 
my hadoop classpath which doesn't make any difference.


I'm running against Hadoop 2.2 and Mahout trunk, compiled with:

     mvn clean install  -Dhadoop2.version=2.2.0 -DskipTests


I'm trying to run the job like this:

     String[] args = {"--input","/input/index"
             ,"--output","/output/vectors"
             ,"--maxNGramSize","3"
             ,"--namedVector", "--overwrite"
     };
     SparseVectorsFromSequenceFiles sparse = new 
SparseVectorsFromSequenceFiles();
     ToolRunner.run(configuration, sparse, args);


Running seq2sparse from the commandline works successfully with no 
exceptions:

     $MAHOUT_HOME/bin/mahout seq2sparse -i /input/index --namedVector -o 
/output/vectors -ow --maxNGramSize 3


Many thanks,

Terry
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message