mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terry Blankers <te...@amritanet.com>
Subject Re: SparseVectorsFromSequenceFiles StandardAnalyzer ClassNotFoundException issue
Date Wed, 04 Jun 2014 05:16:32 GMT
Hi Suneel, can you please provide a little more detail since I still 
can't get this to work.

Which classpath are the Lucene jars supposed to be added to? my java 
project? or the Hadoop instance?

Thanks,

Terry


On 6/3/14, 5:35 PM, Terry Blankers wrote:
> Thanks Suneel. I thought having the jar as a dependency and the class 
> imported was enough.
>
>
> On 6/3/14, 4:18 PM, Suneel Marthi wrote:
>> You r missing the Lucene jars from ur classpath. Mahout's presently 
>> at Lucene 4.6.1 that's what u should be including.
>>
>>
>>
>> On Tuesday, June 3, 2014 3:40 PM, Terry Blankers 
>> <terry@amritanet.com> wrote:
>>
>>
>> Hello, can anyone please give me a clue as to what I may be missing 
>> here?
>>
>> I'm trying to run a SparseVectorsFromSequenceFiles job via ToolRunner
>> from a java project and I'm getting the following exception:
>>
>> Error: java.lang.ClassNotFoundException:
>> org.apache.lucene.analysis.standard.StandardAnalyzer
>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>       at
>> org.apache.mahout.vectorizer.document.SequenceFileTokenizerMapper.setup(SequenceFileTokenizerMapper.java:62)

>>
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)
>>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>>       at java.security.AccessController.doPrivileged(Native Method)
>>       at javax.security.auth.Subject.doAs(Subject.java:415)
>>       at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

>>
>>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
>>
>>
>> I've tried adding the location of lucene-analyzers-common-4.6.1.jar to
>> my hadoop classpath which doesn't make any difference.
>>
>>
>> I'm running against Hadoop 2.2 and Mahout trunk, compiled with:
>>
>>       mvn clean install  -Dhadoop2.version=2.2.0 -DskipTests
>>
>>
>> I'm trying to run the job like this:
>>
>>       String[] args = {"--input","/input/index"
>>               ,"--output","/output/vectors"
>>               ,"--maxNGramSize","3"
>>               ,"--namedVector", "--overwrite"
>>       };
>>       SparseVectorsFromSequenceFiles sparse = new
>> SparseVectorsFromSequenceFiles();
>>       ToolRunner.run(configuration, sparse, args);
>>
>>
>> Running seq2sparse from the commandline works successfully with no
>> exceptions:
>>
>>       $MAHOUT_HOME/bin/mahout seq2sparse -i /input/index 
>> --namedVector -o
>> /output/vectors -ow --maxNGramSize 3
>>
>>
>> Many thanks,
>>
>> Terry
>


Mime
View raw message