mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: i have met a problem when i do "Creating Vectors from Text"
Date Thu, 22 Oct 2009 16:54:48 GMT
Or you need to modify the Driver class to extract two different fields and
combine the resulting vectors.  The labels should probably remember which
field they came from.

2009/10/22 周峰 <feng2211@gmail.com>

> ok,thanks for your advice.
>
>
> 2009/10/22 Grant Ingersoll <gsingers@apache.org>
>
> > You will need to have a "catch-all" field that collects the two fields
> > together.
> >
> >
> > On Oct 22, 2009, at 4:54 AM, 周峰 wrote:
> >
> > Thank you.you have reminded me to store term vectors.The class
> >> "org.apache.lucene.demo.IndexFiles " (in
> >> http://lucene.apache.org/java/2_9_0/demo.html)  which i used to create
> >> index
> >> files does not store term vectors.
> >>
> >> Now i can run the example of kmeans successfully.I have another
> >> question.When i use the class
> >> "org.apache.mahout.utils.vectors.lucene.Driver" to create vectors from
> >> index
> >> files,this class can convert a field of index to an output file,and then
> >> the
> >> KMeansDriver can run based on the output file.But in my application,i
> want
> >> the kmeans to compute based on multi-field .Because in my application,i
> >> use
> >> multi-field to describe one object.
> >> How do i achieve my goal? Thanks
> >>
> >>
> >>
> >> 2009/10/22 Grant Ingersoll <gsingers@apache.org>
> >>
> >> Do you have Term Vectors stored?
> >>>
> >>>
> >>> On Oct 21, 2009, at 9:52 AM, 周峰 wrote:
> >>>
> >>> yes.i have solved the problem.These jarfiles must be added in
> classpath.
> >>>
> >>>> root@master:/home/zhoufeng/mahout/trunk/utils/target/dependency# java
> >>>> -cp
> >>>>
> >>>>
> >>>>
> /home/zhoufeng/mahout/trunk/utils/target/mahout-utils-0.2-SNAPSHOT.jar:lucene-core-2.9.0.jar:slf4j-api-1.5.8.jar:slf4j-jcl-1.5.8.jar:commons-logging-1.1.1.jar:commons-cli-2.0-mahout.jar:/home/zhoufeng/mahout/trunk/core/target/mahout-core-0.2-SNAPSHOT.jar:hadoop-core-0.20.1.jar
> >>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
> >>>> /home/zhoufeng/newdisk/newindex/ --field string -t
> >>>> /home/zhoufeng/newdisk/di
> >>>> ct.txt --output /home/zhoufeng/newdisk/out.txt --max 50
> >>>>
> >>>> 2009-10-21 21:44:39 org.slf4j.impl.JCLLoggerAdapter info
> >>>> Output File: /home/zhoufeng/newdisk/out.txt
> >>>> 2009-10-21 21:44:40 org.apache.hadoop.util.NativeCodeLoader <clinit>
> >>>> Unable to load native-hadoop library for your platform... using
> >>>> builtin-java classes where applicable
> >>>> 2009-10-21 21:44:40 org.apache.hadoop.io.compress.CodecPool
> >>>> getCompressor
> >>>> Got brand-new compressor
> >>>> Exception in thread "main" java.lang.NullPointerException
> >>>>     at
> >>>>
> >>>>
> >>>>
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:110)
> >>>>     at
> >>>>
> >>>>
> >>>>
> org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:81)
> >>>>     at
> >>>>
> >>>>
> >>>>
> org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40)
> >>>>     at
> >>>> org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200)
> >>>>
> >>>> Does this exception show that there are some problems in my index
> files?
> >>>>
> >>>> i used the class  "org.apache.lucene.demo.IndexFiles " (in
> >>>> http://lucene.apache.org/java/2_9_0/demo.html) to create my index
> >>>> files.And
> >>>> i have used the class "org.apache.lucene.demo.SearchFiles"(also in
> >>>> http://lucene.apache.org/java/2_9_0/demo.html) to search the index
> >>>> successfully.
> >>>> 2009/10/21 Grant Ingersoll <gsingers@apache.org>
> >>>>
> >>>> Here's what I use to run it, as generated by IntelliJ (<substitute
> >>>> <HOME>
> >>>>
> >>>>> with your appropriate value)
> >>>>> :
> >>>>> java -Xmx1024M -Dfile.encoding=UTF-8 -classpath
> >>>>>
> >>>>>
> >>>>>
> <HOME>/projects/lucene/mahout/mahout-clean/utils/target/classes:<HOME>/projects/lucene/mahout/mahout-clean/core/target/classes:<HOME>/.m2/repository/org/apache/mahout/hadoop/hadoop-core/0.20.1/hadoop-core-0.20.1.jar:<HOME>/.m2/repository/org/apache/mahout/hbase/hbase/0.20.0/hbase-0.20.0.jar:<HOME>/.m2/repository/org/apache/mahout/kosmofs/kfs/0.3/kfs-0.3.jar:<HOME>/.m2/repository/org/apache/mahout/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:<HOME>/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:<HOME>/.m2/repository/commons-logging/commons-logging/1.1.1/commons-logging-1.1.1.jar:<HOME>/.m2/repository/commons-httpclient/commons-httpclient/3.1/commons-httpclient-3.1.jar:<HOME>/.m2/repository/commons-codec/commons-codec/1.2/commons-codec-1.2.jar:<HOME>/.m2/repository/commons-dbcp/commons-dbcp/1.2.2/commons-dbcp-1.2.2.jar:<HOME>/.m2/repository/commons-pool/commons-pool/1.4/commons-pool-1.4.jar:<HOME>/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:<HOME>/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:<HOME>/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:<HOME>/.m2/repository/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.jar:<HOME>/.m2/repository/org/slf4j/slf4j-jcl/1.5.8/slf4j-jcl-1.5.8.jar:<HOME>/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:<HOME>/.m2/repository/org/apache/mahout/watchmaker/watchmaker-framework/0.6.2/watchmaker-framework-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/watchmaker/watchmaker-swing/0.6.2/watchmaker-swing-0.6.2.jar:<HOME>/.m2/repository/org/apache/mahout/uncommons/math/uncommons-math/1.2/uncommons-math-1.2.jar:<HOME>/.m2/repository/com/thoughtworks/xstream/xstream/1.2.1/xstream-1.2.1.jar:<HOME>/.m2/repository/xpp3/xpp3_min/1.1.3.4.O/xpp3_min-1.1.3.4.O.jar:<HOME>/.m2/repository/org/apache/lucene/lucene-analyzers/2.9.0/lucene-analyzers-2.9.0.jar:<HOME>/.m2/repository/org/apache/lucene/lucene-core/2.9.0/lucene-core-2.9.0.jar:<HOME>/.m2/repository/org/apache/mahout/commons/commons-cli/2.0-mahout/commons-cli-2.0-mahout.jar:<HOME>/.m2/repository/commons-math/commons-math/1.2/commons-math-1.2.jar:<HOME>/.m2/repository/junit/junit/3.8.2/junit-3.8.2.jar:<HOME>/.m2/repository/org/easymock/easymockclassextension/2.2/easymockclassextension-2.2.jar:<HOME>/.m2/repository/org/easymock/easymock/2.2/easymock-2.2.jar:<HOME>/.m2/repository/cglib/cglib-nodep/2.1_3/cglib-nodep-2.1_3.jar:<HOME>/.m2/repository/com/google/code/gson/gson/1.3/gson-1.3.jar:<HOME>/.m2/repository/org/easymock/easymock/2.4/easymock-2.4.jar:<HOME>/.m2/repository/org/easymock/easymockclassextension/2.4/easymockclassextension-2.4.jar:<HOME>/.m2/repository/cglib/cglib/2.1_3/cglib-2.1_3.jar:<HOME>/.m2/repository/asm/asm/1.5.3/asm-1.5.3.jar
> >>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
> >>>>> <HOME>/projects/lucene/solr/wikipedia/solr/data/index --field
body -t
> >>>>> <HOME>/projects/lucene/solr/wikipedia/dict.txt --output
> >>>>> <HOME>/projects/lucene/solr/wikipedia/part-50.txt --max 50
> >>>>>
> >>>>> One way to quickly get all of the dependencies in a single directory
> >>>>> for
> >>>>> inclusion on the command line is via Maven's copy-dependencies goal:
> >>>>>  mvn
> >>>>> dependency:copy-dependencies
> >>>>>
> >>>>> This will download all the dependencies under a subdir of the target
> >>>>> dir.
> >>>>>
> >>>>>
> >>>>> On Oct 21, 2009, at 4:04 AM, 周峰 wrote:
> >>>>>
> >>>>> At first,i have  built a Lucene index in my directory
> >>>>>
> >>>>> "/home/zhoufeng/newdisk/newindex",then i want to create Vectors
from
> >>>>>> the
> >>>>>> index files.
> >>>>>> then i met a problem
> >>>>>> root@master:/home/zhoufeng/mahout/trunk/utils/target# java -cp
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> mahout-utils-0.2-SNAPSHOT.jar:/home/zhoufeng/mahout/trunk/core/target/mahout-core-0.2-SNAPSHOT.jar
> >>>>>> org.apache.mahout.utils.vectors.lucene.Driver --dir
> >>>>>> /home/zhoufeng/newdisk/newindex string --dictOut
> >>>>>> /home/zhoufeng/newdisk/newindex/dict.txt --output
> >>>>>> /home/zhoufeng/newdisk/newindex/out.txt -max 50
> >>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
> >>>>>> org/apache/commons/cli2/OptionException
> >>>>>> Caused by: java.lang.ClassNotFoundException:
> >>>>>> org.apache.commons.cli2.OptionException
> >>>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> >>>>>>   at java.security.AccessController.doPrivileged(Native Method)
> >>>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> >>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> >>>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >>>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
> >>>>>>   at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
> >>>>>> Could not find the main class:
> >>>>>> org.apache.mahout.utils.vectors.lucene.Driver.  Program will
exit.
> >>>>>>
> >>>>>> i do not know where is the java file
> >>>>>> "org.apache.commons.cli2.OptionException".
> >>>>>> Is It because some jar file is absent?
> >>>>>>
> >>>>>> can anyone help me? thanks
> >>>>>>
> >>>>>>
> >>>>>> --------------------------
> >>>>> Grant Ingersoll
> >>>>> http://www.lucidimagination.com/
> >>>>>
> >>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> >>>>> using
> >>>>> Solr/Lucene:
> >>>>> http://www.lucidimagination.com/search
> >>>>>
> >>>>>
> >>>>>
> >>>>> --------------------------
> >>> Grant Ingersoll
> >>> http://www.lucidimagination.com/
> >>>
> >>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
> using
> >>> Solr/Lucene:
> >>> http://www.lucidimagination.com/search
> >>>
> >>>
> >>>
> > --------------------------
> > Grant Ingersoll
> > http://www.lucidimagination.com/
> >
> > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> > Solr/Lucene:
> > http://www.lucidimagination.com/search
> >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message