mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: i have met a problem when i do "Creating Vectors from Text"
Date Thu, 22 Oct 2009 19:34:12 GMT

On Oct 22, 2009, at 12:54 PM, Ted Dunning wrote:

> Or you need to modify the Driver class to extract two different  
> fields and
> combine the resulting vectors.  The labels should probably remember  
> which
> field they came from.

Yes, a change in code would do the trick as well, but there is no out  
of the box way to do this currently.

>
> 2009/10/22 周峰 <feng2211@gmail.com>
>
>> ok,thanks for your advice.
>>
>>
>> 2009/10/22 Grant Ingersoll <gsingers@apache.org>
>>
>>> You will need to have a "catch-all" field that collects the two  
>>> fields
>>> together.
>>>
>>>
>>> On Oct 22, 2009, at 4:54 AM, 周峰 wrote:
>>>
>>> Thank you.you have reminded me to store term vectors.The class
>>>> "org.apache.lucene.demo.IndexFiles " (in
>>>> http://lucene.apache.org/java/2_9_0/demo.html)  which i used to  
>>>> create
>>>> index
>>>> files does not store term vectors.
>>>>
>>>> Now i can run the example of kmeans successfully.I have another
>>>> question.When i use the class
>>>> "org.apache.mahout.utils.vectors.lucene.Driver" to create vectors  
>>>> from
>>>> index
>>>> files,this class can convert a field of index to an output  
>>>> file,and then
>>>> the
>>>> KMeansDriver can run based on the output file.But in my  
>>>> application,i
>> want
>>>> the kmeans to compute based on multi-field .Because in my  
>>>> application,i
>>>> use
>>>> multi-field to describe one object.
>>>> How do i achieve my goal? Thanks
>>>>
>>>>
>>>>
>>>> 2009/10/22 Grant Ingersoll <gsingers@apache.org>
>>>>
>>>> Do you have Term Vectors stored?
>>>>>
>>>>>
>>>>> On Oct 21, 2009, at 9:52 AM, 周峰 wrote:
>>>>>
>>>>> yes.i have solved the problem.These jarfiles must be added in
>> classpath.
>>>>>
>>>>>> root@master:/home/zhoufeng/mahout/trunk/utils/target/ 
>>>>>> dependency# java
>>>>>> -cp
>>>>>>
>>>>>>
>>>>>>
>> /home/zhoufeng/mahout/trunk/utils/target/mahout-utils-0.2- 
>> SNAPSHOT.jar:lucene-core-2.9.0.jar:slf4j-api-1.5.8.jar:slf4j- 
>> jcl-1.5.8.jar:commons-logging-1.1.1.jar:commons-cli-2.0-mahout.jar:/ 
>> home/zhoufeng/mahout/trunk/core/target/mahout-core-0.2- 
>> SNAPSHOT.jar:hadoop-core-0.20.1.jar
>>>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>>>>> /home/zhoufeng/newdisk/newindex/ --field string -t
>>>>>> /home/zhoufeng/newdisk/di
>>>>>> ct.txt --output /home/zhoufeng/newdisk/out.txt --max 50
>>>>>>
>>>>>> 2009-10-21 21:44:39 org.slf4j.impl.JCLLoggerAdapter info
>>>>>> Output File: /home/zhoufeng/newdisk/out.txt
>>>>>> 2009-10-21 21:44:40 org.apache.hadoop.util.NativeCodeLoader  
>>>>>> <clinit>
>>>>>> Unable to load native-hadoop library for your platform... using
>>>>>> builtin-java classes where applicable
>>>>>> 2009-10-21 21:44:40 org.apache.hadoop.io.compress.CodecPool
>>>>>> getCompressor
>>>>>> Got brand-new compressor
>>>>>> Exception in thread "main" java.lang.NullPointerException
>>>>>>    at
>>>>>>
>>>>>>
>>>>>>
>> org.apache.mahout.utils.vectors.lucene.LuceneIterable 
>> $TDIterator.next(LuceneIterable.java:110)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>>
>> org.apache.mahout.utils.vectors.lucene.LuceneIterable 
>> $TDIterator.next(LuceneIterable.java:81)
>>>>>>    at
>>>>>>
>>>>>>
>>>>>>
>> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write 
>> (SequenceFileVectorWriter.java:40)
>>>>>>    at
>>>>>> org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java: 
>>>>>> 200)
>>>>>>
>>>>>> Does this exception show that there are some problems in my index
>> files?
>>>>>>
>>>>>> i used the class  "org.apache.lucene.demo.IndexFiles " (in
>>>>>> http://lucene.apache.org/java/2_9_0/demo.html) to create my index
>>>>>> files.And
>>>>>> i have used the class "org.apache.lucene.demo.SearchFiles"(also 

>>>>>> in
>>>>>> http://lucene.apache.org/java/2_9_0/demo.html) to search the  
>>>>>> index
>>>>>> successfully.
>>>>>> 2009/10/21 Grant Ingersoll <gsingers@apache.org>
>>>>>>
>>>>>> Here's what I use to run it, as generated by IntelliJ  
>>>>>> (<substitute
>>>>>> <HOME>
>>>>>>
>>>>>>> with your appropriate value)
>>>>>>> :
>>>>>>> java -Xmx1024M -Dfile.encoding=UTF-8 -classpath
>>>>>>>
>>>>>>>
>>>>>>>
>> <HOME>/projects/lucene/mahout/mahout-clean/utils/target/ 
>> classes:<HOME>/projects/lucene/mahout/mahout-clean/core/target/ 
>> classes:<HOME>/.m2/repository/org/apache/mahout/hadoop/hadoop-core/ 
>> 0.20.1/hadoop-core-0.20.1.jar:<HOME>/.m2/repository/org/apache/ 
>> mahout/hbase/hbase/0.20.0/hbase-0.20.0.jar:<HOME>/.m2/repository/ 
>> org/apache/mahout/kosmofs/kfs/0.3/kfs-0.3.jar:<HOME>/.m2/repository/ 
>> org/apache/mahout/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:<HOME>/.m2/ 
>> repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:<HOME>/.m2/repository/ 
>> commons-logging/commons-logging/1.1.1/commons- 
>> logging-1.1.1.jar:<HOME>/.m2/repository/commons-httpclient/commons- 
>> httpclient/3.1/commons-httpclient-3.1.jar:<HOME>/.m2/repository/ 
>> commons-codec/commons-codec/1.2/commons-codec-1.2.jar:<HOME>/.m2/ 
>> repository/commons-dbcp/commons-dbcp/1.2.2/commons- 
>> dbcp-1.2.2.jar:<HOME>/.m2/repository/commons-pool/commons-pool/1.4/ 
>> commons-pool-1.4.jar:<HOME>/.m2/repository/log4j/log4j/1.2.15/ 
>> log4j-1.2.15.jar:<HOME>/.m2/repository/javax/mail/mail/1.4/ 
>> mail-1.4.jar:<HOME>/.m2/repository/javax/activation/activation/1.1/ 
>> activation-1.1.jar:<HOME>/.m2/repository/org/slf4j/slf4j-api/1.5.8/ 
>> slf4j-api-1.5.8.jar:<HOME>/.m2/repository/org/slf4j/slf4j-jcl/1.5.8/ 
>> slf4j-jcl-1.5.8.jar:<HOME>/.m2/repository/commons-lang/commons-lang/ 
>> 2.4/commons-lang-2.4.jar:<HOME>/.m2/repository/org/apache/mahout/ 
>> watchmaker/watchmaker-framework/0.6.2/watchmaker- 
>> framework-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/ 
>> watchmaker/watchmaker-swing/0.6.2/watchmaker- 
>> swing-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/uncommons/ 
>> math/uncommons-math/1.2/uncommons-math-1.2.jar:<HOME>/.m2/ 
>> repository/com/thoughtworks/xstream/xstream/1.2.1/ 
>> xstream-1.2.1.jar:<HOME>/.m2/repository/xpp3/xpp3_min/1.1.3.4.O/ 
>> xpp3_min-1.1.3.4.O.jar:<HOME>/.m2/repository/org/apache/lucene/ 
>> lucene-analyzers/2.9.0/lucene-analyzers-2.9.0.jar:<HOME>/.m2/ 
>> repository/org/apache/lucene/lucene-core/2.9.0/lucene- 
>> core-2.9.0.jar:<HOME>/.m2/repository/org/apache/mahout/commons/ 
>> commons-cli/2.0-mahout/commons-cli-2.0-mahout.jar:<HOME>/.m2/ 
>> repository/commons-math/commons-math/1.2/commons- 
>> math-1.2.jar:<HOME>/.m2/repository/junit/junit/3.8.2/ 
>> junit-3.8.2.jar:<HOME>/.m2/repository/org/easymock/ 
>> easymockclassextension/2.2/ 
>> easymockclassextension-2.2.jar:<HOME>/.m2/repository/org/easymock/ 
>> easymock/2.2/easymock-2.2.jar:<HOME>/.m2/repository/cglib/cglib- 
>> nodep/2.1_3/cglib-nodep-2.1_3.jar:<HOME>/.m2/repository/com/google/ 
>> code/gson/gson/1.3/gson-1.3.jar:<HOME>/.m2/repository/org/easymock/ 
>> easymock/2.4/easymock-2.4.jar:<HOME>/.m2/repository/org/easymock/ 
>> easymockclassextension/2.4/ 
>> easymockclassextension-2.4.jar:<HOME>/.m2/repository/cglib/cglib/ 
>> 2.1_3/cglib-2.1_3.jar:<HOME>/.m2/repository/asm/asm/1.5.3/ 
>> asm-1.5.3.jar
>>>>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>>>>>> <HOME>/projects/lucene/solr/wikipedia/solr/data/index --field
 
>>>>>>> body -t
>>>>>>> <HOME>/projects/lucene/solr/wikipedia/dict.txt --output
>>>>>>> <HOME>/projects/lucene/solr/wikipedia/part-50.txt --max
50
>>>>>>>
>>>>>>> One way to quickly get all of the dependencies in a single  
>>>>>>> directory
>>>>>>> for
>>>>>>> inclusion on the command line is via Maven's copy-dependencies
 
>>>>>>> goal:
>>>>>>> mvn
>>>>>>> dependency:copy-dependencies
>>>>>>>
>>>>>>> This will download all the dependencies under a subdir of the
 
>>>>>>> target
>>>>>>> dir.
>>>>>>>
>>>>>>>
>>>>>>> On Oct 21, 2009, at 4:04 AM, 周峰 wrote:
>>>>>>>
>>>>>>> At first,i have  built a Lucene index in my directory
>>>>>>>
>>>>>>> "/home/zhoufeng/newdisk/newindex",then i want to create  
>>>>>>> Vectors from
>>>>>>>> the
>>>>>>>> index files.
>>>>>>>> then i met a problem
>>>>>>>> root@master:/home/zhoufeng/mahout/trunk/utils/target# java
-cp
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>> mahout-utils-0.2-SNAPSHOT.jar:/home/zhoufeng/mahout/trunk/core/ 
>> target/mahout-core-0.2-SNAPSHOT.jar
>>>>>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
>>>>>>>> /home/zhoufeng/newdisk/newindex string --dictOut
>>>>>>>> /home/zhoufeng/newdisk/newindex/dict.txt --output
>>>>>>>> /home/zhoufeng/newdisk/newindex/out.txt -max 50
>>>>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>>> org/apache/commons/cli2/OptionException
>>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>>> org.apache.commons.cli2.OptionException
>>>>>>>>  at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>>>>>  at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>>  at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>>  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:

>>>>>>>> 301)
>>>>>>>>  at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>>>>>>  at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:

>>>>>>>> 320)
>>>>>>>> Could not find the main class:
>>>>>>>> org.apache.mahout.utils.vectors.lucene.Driver.  Program will
 
>>>>>>>> exit.
>>>>>>>>
>>>>>>>> i do not know where is the java file
>>>>>>>> "org.apache.commons.cli2.OptionException".
>>>>>>>> Is It because some jar file is absent?
>>>>>>>>
>>>>>>>> can anyone help me? thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------
>>>>>>> Grant Ingersoll
>>>>>>> http://www.lucidimagination.com/
>>>>>>>
>>>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/ 
>>>>>>> Droids)
>>>>>>> using
>>>>>>> Solr/Lucene:
>>>>>>> http://www.lucidimagination.com/search
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>>
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
>> using
>>>>> Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>>
>>>>>
>>>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>
>
>
>
> -- 
> Ted Dunning, CTO
> DeepDyve

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Mime
View raw message