mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: Help with running clusterdump after running Dirichlet
Date Fri, 16 Jul 2010 07:24:22 GMT
I am trying to run clusterdumper from trunk. seems like its
not outputting anything. Need to investigate
bin/mahout clusterdump -s reuters-clusters/cluster-6/part-r-00000 -d
reuters-vectors/dictionary.file-0  -dt sequencefile -n 10 -b 100

On Fri, Jul 16, 2010 at 7:08 AM, Jeff Eastman <jdog@windwardsolutions.com>wrote:

> Also it looks like you are not passing a clusters-n directory to the
> --seqFileDir as you were in your first posting. ClusterDumper won't output
> anything if it cannot read clusters from that directory. Also, all the
> synthetic control jobs now all call ClusterDumper automatically after
> clustering the points.
>
>
> On 7/15/10 5:58 PM, Jeff Eastman wrote:
>
>> Hi Gokul,
>>
>> Try building and running again. I committed a patch to ClusterDumper which
>> handles the _log file error when running on Hadoop.
>>
>> Jeff
>>
>> On 7/15/10 2:27 PM, Gokul Pillai wrote:
>>
>>> My bad. After setting HADOOP_CONF_DIR and HADOOP_HOME, I now don't get
>>> the
>>> errors.
>>> However, I dont get any output too.
>>> I tried this command too but again no output:
>>> ./bin/mahout clusterdump --seqFileDir dirichlet/output/data/ --pointsDir
>>> dirichlet/output/clusteredPoints/ --output dumpOut
>>>
>>> Anybody run the clusterdump successfully?
>>>
>>>
>>> On Thu, Jul 15, 2010 at 2:19 PM, Gokul Pillai<gokooltech@gmail.com>
>>>  wrote:
>>>
>>>  I have Cloudera's CDH3 running on Ubuntu 10.04 version. And I have
>>>> Apache
>>>> Mahout (0.40 Snapshot version from yesterday).
>>>>
>>>> I was trying to get the clustering examples running based on the wiki
>>>> page
>>>>
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data.
>>>>
>>>> At the bottom of this page, there is a section that describes how to get
>>>> the data out and process it.
>>>> Get the data out of HDFS  3
>>>> <
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote3>
>>>>   4
>>>>
>>>> <
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote4>
>>>>  and
>>>> have a look  5
>>>> <
>>>> https://cwiki.apache.org/confluence/display/MAHOUT/Synthetic+Control+Data#Footnote5>
>>>>
>>>>
>>>>    - All example jobs use *testdata* as input and output to directory *
>>>>    output*
>>>>    - Use *bin/hadoop fs -lsr output* to view all outputs. Copy them all
>>>> to
>>>>    your local machine and you can run the ClusterDumper on them.
>>>>       - Sequence files containing the original points in Vector form are
>>>>       in *output/data*
>>>>       - Computed clusters are contained in *output/clusters-i*
>>>>       - All result clustered points are placed into *
>>>>       output/clusteredPoints*
>>>>
>>>>
>>>> So I got the data out of HDFS onto my local and it looks like this:
>>>>
>>>> hadoop@ubuntu:~/mahoutOutputs$ ls -l dirichlet/output/
>>>> total 32
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusteredPoints
>>>> drwxr-xr-x 2 hadoop hadoop 4096 2010-07-13 16:06 clusters-0
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-1
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-2
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-3
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-4
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 clusters-5
>>>> drwxr-xr-x 3 hadoop hadoop 4096 2010-07-13 16:06 data
>>>>
>>>>
>>>> However, when I ran clusterdump on this, I get the following error. Any
>>>> help on why clusterdump is complaining about a "_logs" folder would be
>>>> helpful:
>>>>
>>>> hadoop@ubuntu:~/mahoutOutputs$ ../mahoutsvn/trunk/bin/mahout
>>>> clusterdump
>>>> --seqFileDir dirichlet/output/clusters-1 --pointsDir
>>>> dirichlet/output/clusteredPoints/ --output dumpOut
>>>> no HADOOP_CONF_DIR or HADOOP_HOME set, running locally
>>>> Exception in thread "main" java.io.FileNotFoundException:
>>>> /home/hadoop/mahoutOutputs/dirichlet/output/clusteredPoints/_logs (Is a
>>>> directory)
>>>>     at java.io.FileInputStream.open(Native Method)
>>>>     at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>>>     at
>>>> org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.<init>(RawLocalFileSystem.java:63)
>>>>
>>>>     at
>>>> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.<init>(RawLocalFileSystem.java:99)
>>>>
>>>>     at
>>>> org.apache.hadoop.fs.RawLocalFileSystem.open(RawLocalFileSystem.java:169)
>>>>
>>>>     at
>>>> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:126)
>>>>
>>>>     at
>>>> org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
>>>>
>>>>     at
>>>> org.apache.hadoop.io.SequenceFile$Reader.openFile(SequenceFile.java:1437)
>>>>
>>>>     at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>>     at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
>>>>     at
>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
>>>>     at
>>>> org.apache.mahout.utils.clustering.ClusterDumper.readPoints(ClusterDumper.java:323)
>>>>
>>>>     at
>>>> org.apache.mahout.utils.clustering.ClusterDumper.init(ClusterDumper.java:93)
>>>>
>>>>     at
>>>> org.apache.mahout.utils.clustering.ClusterDumper.<init>(ClusterDumper.java:86)
>>>>
>>>>     at
>>>> org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:272)
>>>>
>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>     at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>
>>>>     at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>
>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>     at
>>>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>>>
>>>>     at
>>>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>>>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:175)
>>>>
>>>> Regards
>>>> Gokul
>>>>
>>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message