mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <jeast...@Narus.com>
Subject RE: Dirichlet Clustering
Date Thu, 12 May 2011 00:07:18 GMT
Why don't you just run the Job file in examples (o.a.m.clustering.syntheticcontrol.dirichlet.Job?
It has everything you need except the data file. Once you have gotten that running you can
try to drive Dirichlet on your own.
Jeff

-----Original Message-----
From: Keith Thompson [mailto:kthomps6@binghamton.edu] 
Sent: Wednesday, May 11, 2011 11:59 AM
To: user@mahout.apache.org
Subject: Dirichlet Clustering

I am trying to run the example at
https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data.
Since I am new to both Hadoop and Mahout, my problem is most likely an
inadequate understanding of Hadoop at this point.  I have converted the
input file to a sequence file and am now trying to run the Dirichlet
clustering algorithm.  It seems to want a VectorWritable rather than a
text.  How do I make the necessary adjustments?

k_thomp@linux-8awa:~> trunk/bin/mahout dirichlet -i output/chunk-0 -o output
-x 10 -k 6
Running on hadoop, using HADOOP_HOME=/usr/local/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /usr/local/hadoop-0.20.2/src/conf
11/05/10 14:40:01 INFO common.AbstractJob: Command line arguments:
{--alpha=1.0,
--distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
--emitMostLikely=true, --endPhase=2147483647, --input=output/chunk-0,
--maxIter=10, --method=mapreduce,
--modelDist=org.apache.mahout.clustering.dirichlet.models.GaussianClusterDistribution,
--modelPrototype=org.apache.mahout.math.RandomAccessSparseVector,
--numClusters=6, --output=output, --startPhase=0, --tempDir=temp,
--threshold=0}
Exception in thread "main" java.lang.ClassCastException:
org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.math.VectorWritable
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.readPrototypeSize(DirichletDriver.java:250)
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.run(DirichletDriver.java:112)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.mahout.clustering.dirichlet.DirichletDriver.main(DirichletDriver.java:67)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at
org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156

Mime
View raw message