mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gokul Pillai <gokoolt...@gmail.com>
Subject Re: Help with running clusterdump after running Dirichlet
Date Thu, 15 Jul 2010 21:27:41 GMT
My bad. After setting HADOOP_CONF_DIR and HADOOP_HOME, I now don't get the
errors.
However, I dont get any output too.
I tried this command too but again no output:
./bin/mahout clusterdump --seqFileDir dirichlet/output/data/ --pointsDir
dirichlet/output/clusteredPoints/ --output dumpOut

Anybody run the clusterdump successfully?


On Thu, Jul 15, 2010 at 2:19 PM, Gokul Pillai <gokooltech@gmail.com> wrote:

> I have Cloudera's CDH3 running on Ubuntu 10.04 version. And I have Apache
> Mahout (0.40 Snapshot version from yesterday).
>
> I was trying to get the clustering examples running based on the wiki page
> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data.
> At the bottom of this page, there is a section that describes how to get
> the data out and process it.
> Get the data out of HDFS  3
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote3>
 4
>
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote4>
and
> have a look  5
> <https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote5>
>
>    - All example jobs use *testdata* as input and output to directory *
>    output*
>    - Use *bin/hadoop fs -lsr output* to view all outputs. Copy them all to
>    your local machine and you can run the ClusterDumper on them.
>       - Sequence files containing the original points in Vector form are
>       in *output/data*
>       - Computed clusters are contained in *output/clusters-i*
>       - All result clustered points are placed into *
>       output/clusteredPoints*
>
>
> So I got the data out of HDFS onto my local and it looks like this:
>
> hadoop@ubuntu:~/mahoutOutputs$ ls -l dirichlet/output/
> total 32
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusteredPoints
> drwxr-xr-x 2 hadoop hadoop 4096 2010-07-13 16:06 clusters-0
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-1
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-2
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-3
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-4
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-5
> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 data
>
>
> However, when I ran clusterdump on this, I get the following error. Any
> help on why clusterdump is complaining about a "_logs" folder would be
> helpful:
>
> hadoop@ubuntu:~/mahoutOutputs$ ../mahoutsvn/trunk/bin/mahout clusterdump
> --seqFileDir dirichlet/output/clusters-1 --pointsDir
> dirichlet/output/clusteredPoints/ --output dumpOut
> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
> Exception in thread "main" java.io.FileNotFoundException:
> /home/hadoop/mahoutOutputs/dirichlet/output/clusteredPoints/_logs (Is a
> directory)
>     at java.io.FileInputStream.open(Native Method)
>     at java.io.FileInputStream.<init>(FileInputStream.java:106)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:63)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:99)
>     at
> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:169)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>     at
> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>     at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>     at
> org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323)
>     at
> org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:93)
>     at
> org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:86)
>     at
> org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:272)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
>
> Regards
> Gokul
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message