mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: i have met a problem when i do "Creating Vectors from Text"
Date Thu, 22 Oct 2009 12:05:02 GMT
You will need to have a "catch-all" field that collects the two fields  
together.

On Oct 22, 2009, at 4:54 AM, 周峰 wrote:

> Thank you.you have reminded me to store term vectors.The class
> "org.apache.lucene.demo.IndexFiles " (in
> http://lucene.apache.org/java/2_9_0/demo.html)  which i used to  
> create index
> files does not store term vectors.
>
> Now i can run the example of kmeans successfully.I have another
> question.When i use the class
> "org.apache.mahout.utils.vectors.lucene.Driver" to create vectors  
> from index
> files,this class can convert a field of index to an output file,and  
> then the
> KMeansDriver can run based on the output file.But in my  
> application,i want
> the kmeans to compute based on multi-field .Because in my  
> application,i use
> multi-field to describe one object.
> How do i achieve my goal? Thanks
>
>
>
> 2009/10/22 Grant Ingersoll <gsingers@apache.org>
>
>> Do you have Term Vectors stored?
>>
>>
>> On Oct 21, 2009, at 9:52 AM, 周峰 wrote:
>>
>> yes.i have solved the problem.These jarfiles must be added in  
>> classpath.
>>> root@master:/home/zhoufeng/mahout/trunk/utils/target/dependency#  
>>> java -cp
>>>
>>> /home/zhoufeng/mahout/trunk/utils/target/mahout-utils-0.2- 
>>> SNAPSHOT.jar:lucene-core-2.9.0.jar:slf4j-api-1.5.8.jar:slf4j- 
>>> jcl-1.5.8.jar:commons-logging-1.1.1.jar:commons-cli-2.0- 
>>> mahout.jar:/home/zhoufeng/mahout/trunk/core/target/mahout-core-0.2- 
>>> SNAPSHOT.jar:hadoop-core-0.20.1.jar
>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>> /home/zhoufeng/newdisk/newindex/ --field string -t
>>> /home/zhoufeng/newdisk/di
>>> ct.txt --output /home/zhoufeng/newdisk/out.txt --max 50
>>>
>>> 2009-10-21 21:44:39 org.slf4j.impl.JCLLoggerAdapter info
>>> Output File: /home/zhoufeng/newdisk/out.txt
>>> 2009-10-21 21:44:40 org.apache.hadoop.util.NativeCodeLoader <clinit>
>>> Unable to load native-hadoop library for your platform... using
>>> builtin-java classes where applicable
>>> 2009-10-21 21:44:40 org.apache.hadoop.io.compress.CodecPool  
>>> getCompressor
>>> Got brand-new compressor
>>> Exception in thread "main" java.lang.NullPointerException
>>>      at
>>>
>>> org.apache.mahout.utils.vectors.lucene.LuceneIterable 
>>> $TDIterator.next(LuceneIterable.java:110)
>>>      at
>>>
>>> org.apache.mahout.utils.vectors.lucene.LuceneIterable 
>>> $TDIterator.next(LuceneIterable.java:81)
>>>      at
>>>
>>> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write 
>>> (SequenceFileVectorWriter.java:40)
>>>      at
>>> org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200)
>>>
>>> Does this exception show that there are some problems in my index  
>>> files?
>>>
>>> i used the class  "org.apache.lucene.demo.IndexFiles " (in
>>> http://lucene.apache.org/java/2_9_0/demo.html) to create my index
>>> files.And
>>> i have used the class "org.apache.lucene.demo.SearchFiles"(also in
>>> http://lucene.apache.org/java/2_9_0/demo.html) to search the index
>>> successfully.
>>> 2009/10/21 Grant Ingersoll <gsingers@apache.org>
>>>
>>> Here's what I use to run it, as generated by IntelliJ (<substitute  
>>> <HOME>
>>>> with your appropriate value)
>>>> :
>>>> java -Xmx1024M -Dfile.encoding=UTF-8 -classpath
>>>>
>>>> <HOME>/projects/lucene/mahout/mahout-clean/utils/target/ 
>>>> classes:<HOME>/projects/lucene/mahout/mahout-clean/core/target/ 
>>>> classes:<HOME>/.m2/repository/org/apache/mahout/hadoop/hadoop- 
>>>> core/0.20.1/hadoop-core-0.20.1.jar:<HOME>/.m2/repository/org/ 
>>>> apache/mahout/hbase/hbase/0.20.0/hbase-0.20.0.jar:<HOME>/.m2/ 
>>>> repository/org/apache/mahout/kosmofs/kfs/0.3/ 
>>>> kfs-0.3.jar:<HOME>/.m2/repository/org/apache/mahout/jets3t/jets3t/

>>>> 0.7.1/jets3t-0.7.1.jar:<HOME>/.m2/repository/xmlenc/xmlenc/0.52/ 
>>>> xmlenc-0.52.jar:<HOME>/.m2/repository/commons-logging/commons- 
>>>> logging/1.1.1/commons-logging-1.1.1.jar:<HOME>/.m2/repository/ 
>>>> commons-httpclient/commons-httpclient/3.1/commons- 
>>>> httpclient-3.1.jar:<HOME>/.m2/repository/commons-codec/commons- 
>>>> codec/1.2/commons-codec-1.2.jar:<HOME>/.m2/repository/commons- 
>>>> dbcp/commons-dbcp/1.2.2/commons-dbcp-1.2.2.jar:<HOME>/.m2/ 
>>>> repository/commons-pool/commons-pool/1.4/commons- 
>>>> pool-1.4.jar:<HOME>/.m2/repository/log4j/log4j/1.2.15/ 
>>>> log4j-1.2.15.jar:<HOME>/.m2/repository/javax/mail/mail/1.4/ 
>>>> mail-1.4.jar:<HOME>/.m2/repository/javax/activation/activation/ 
>>>> 1.1/activation-1.1.jar:<HOME>/.m2/repository/org/slf4j/slf4j-api/ 
>>>> 1.5.8/slf4j-api-1.5.8.jar:<HOME>/.m2/repository/org/slf4j/slf4j- 
>>>> jcl/1.5.8/slf4j-jcl-1.5.8.jar:<HOME>/.m2/repository/commons-lang/ 
>>>> commons-lang/2.4/commons-lang-2.4.jar:<HOME>/.m2/repository/org/ 
>>>> apache/mahout/watchmaker/watchmaker-framework/0.6.2/watchmaker- 
>>>> framework-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/ 
>>>> watchmaker/watchmaker-swing/0.6.2/watchmaker- 
>>>> swing-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/uncommons/

>>>> math/uncommons-math/1.2/uncommons-math-1.2.jar:<HOME>/.m2/ 
>>>> repository/com/thoughtworks/xstream/xstream/1.2.1/ 
>>>> xstream-1.2.1.jar:<HOME>/.m2/repository/xpp3/xpp3_min/1.1.3.4.O/ 
>>>> xpp3_min-1.1.3.4.O.jar:<HOME>/.m2/repository/org/apache/lucene/ 
>>>> lucene-analyzers/2.9.0/lucene-analyzers-2.9.0.jar:<HOME>/.m2/ 
>>>> repository/org/apache/lucene/lucene-core/2.9.0/lucene- 
>>>> core-2.9.0.jar:<HOME>/.m2/repository/org/apache/mahout/commons/ 
>>>> commons-cli/2.0-mahout/commons-cli-2.0-mahout.jar:<HOME>/.m2/ 
>>>> repository/commons-math/commons-math/1.2/commons- 
>>>> math-1.2.jar:<HOME>/.m2/repository/junit/junit/3.8.2/ 
>>>> junit-3.8.2.jar:<HOME>/.m2/repository/org/easymock/ 
>>>> easymockclassextension/2.2/ 
>>>> easymockclassextension-2.2.jar:<HOME>/.m2/repository/org/easymock/

>>>> easymock/2.2/easymock-2.2.jar:<HOME>/.m2/repository/cglib/cglib- 
>>>> nodep/2.1_3/cglib-nodep-2.1_3.jar:<HOME>/.m2/repository/com/ 
>>>> google/code/gson/gson/1.3/gson-1.3.jar:<HOME>/.m2/repository/org/ 
>>>> easymock/easymock/2.4/easymock-2.4.jar:<HOME>/.m2/repository/org/ 
>>>> easymock/easymockclassextension/2.4/ 
>>>> easymockclassextension-2.4.jar:<HOME>/.m2/repository/cglib/cglib/ 
>>>> 2.1_3/cglib-2.1_3.jar:<HOME>/.m2/repository/asm/asm/1.5.3/ 
>>>> asm-1.5.3.jar
>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>>> <HOME>/projects/lucene/solr/wikipedia/solr/data/index --field  
>>>> body -t
>>>> <HOME>/projects/lucene/solr/wikipedia/dict.txt --output
>>>> <HOME>/projects/lucene/solr/wikipedia/part-50.txt --max 50
>>>>
>>>> One way to quickly get all of the dependencies in a single  
>>>> directory for
>>>> inclusion on the command line is via Maven's copy-dependencies  
>>>> goal:  mvn
>>>> dependency:copy-dependencies
>>>>
>>>> This will download all the dependencies under a subdir of the  
>>>> target dir.
>>>>
>>>>
>>>> On Oct 21, 2009, at 4:04 AM, 周峰 wrote:
>>>>
>>>> At first,i have  built a Lucene index in my directory
>>>>
>>>>> "/home/zhoufeng/newdisk/newindex",then i want to create Vectors  
>>>>> from the
>>>>> index files.
>>>>> then i met a problem
>>>>> root@master:/home/zhoufeng/mahout/trunk/utils/target# java -cp
>>>>>
>>>>>
>>>>> mahout-utils-0.2-SNAPSHOT.jar:/home/zhoufeng/mahout/trunk/core/ 
>>>>> target/mahout-core-0.2-SNAPSHOT.jar
>>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>>>> /home/zhoufeng/newdisk/newindex string --dictOut
>>>>> /home/zhoufeng/newdisk/newindex/dict.txt --output
>>>>> /home/zhoufeng/newdisk/newindex/out.txt -max 50
>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>> org/apache/commons/cli2/OptionException
>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>> org.apache.commons.cli2.OptionException
>>>>>    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>>    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 
>>>>> 301)
>>>>>    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>>>    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java: 
>>>>> 320)
>>>>> Could not find the main class:
>>>>> org.apache.mahout.utils.vectors.lucene.Driver.  Program will exit.
>>>>>
>>>>> i do not know where is the java file
>>>>> "org.apache.commons.cli2.OptionException".
>>>>> Is It because some jar file is absent?
>>>>>
>>>>> can anyone help me? thanks
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ 
>>>> Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>> using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message