mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paritosh Ranjan <pran...@xebia.com>
Subject Re: Does clusterdump still support option "--seqFileDir"?
Date Wed, 05 Sep 2012 08:02:54 GMT
I think its version/doc mismatch. The current version just takes the 
input path as seqFileDir.

seqFileDir = getInputPath();


On 05-09-2012 12:56, javaboom wrote:
> I've tried to use "clusterdump". I followed this manual
> https://cwiki.apache.org/MAHOUT/cluster-dumper.html
>
> I tried the following command line
>
>   $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
> --pointsDir output/clusteredPoints --output
> $MAHOUT_HOME/examples/output/clusteranalyze.txt
>
> I got a problem i.e., "clusterdump" cannot recognize the option
> "--seqFileDir". Then I checked the help option of the command as follows:
>
> ============================================================================
> root@ubuntu:~/trunk/bin# ./mahout clusterdump --help
> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
> MAHOUT-JOB: /root/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> Usage:
>   [--input <input> --output <output> --outputFormat <outputFormat>
> --substring
> <substring> --numWords <numWords> --pointsDir <pointsDir> --samplePoints
> <samplePoints> --dictionary <dictionary> --dictionaryType <dictionaryType>
> --evaluate --distanceMeasure <distanceMeasure> --help --tempDir <tempDir>
> --startPhase <startPhase> --endPhase <endPhase>]
> Job-Specific Options:
>    --input (-i) input                         Path to job input directory.
>    --output (-o) output                       The directory pathname for
> output.
>    --outputFormat (-of) outputFormat          The optional output format to
>                                               write the results as.  Options:
>                                               TEXT, CSV or GRAPH_ML
>    --substring (-b) substring                 The number of chars of the
>                                               asFormatString() to print
>    --numWords (-n) numWords                   The number of top terms to
> print
>    --pointsDir (-p) pointsDir                 The directory containing points
>                                               sequence files mapping input
>                                               vectors to their cluster.  If
>                                               specified, then the program
> will
>                                               output the points associated
> with
>                                               a cluster
>    --samplePoints (-sp) samplePoints          Specifies the maximum number of
>                                               points to include _per_
> cluster.
>                                               The default is to include all
>                                               points
>    --dictionary (-d) dictionary               The dictionary file
>    --dictionaryType (-dt) dictionaryType      The dictionary file type
>                                               (text|sequencefile)
>    --evaluate (-e)                            Run ClusterEvaluator and
>                                               CDbwEvaluator over the input.
> The
>                                               output will be appended to the
>                                               rest of the output at the end.
>    --distanceMeasure (-dm) distanceMeasure    The classname of the
>                                               DistanceMeasure. Default is
>                                               SquaredEuclidean
>    --help (-h)                                Print out help
>    --tempDir tempDir                          Intermediate output directory
>    --startPhase startPhase                    First phase to run
>    --endPhase endPhase                        Last phase to run
> Specify HDFS directories while running on hadoop; else specify local file
> system directories
> 12/09/05 15:17:25 INFO driver.MahoutDriver: Program took 170 ms (Minutes:
> 0.0028333333333333335)
> ============================================================================
>
> Could you please help me? How can I solve this problem? Have I used
> different Mahout version?
>
> Thank you in advance
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Does-clusterdump-still-support-option-seqFileDir-tp4005517.html
> Sent from the Mahout User List mailing list archive at Nabble.com.



Mime
View raw message