mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: Check if mahout is indeed running on Hadoop
Date Tue, 12 Nov 2013 23:45:17 GMT
Hi Fernando,

This could be related to a Bug (see MAHOUT-1319) in seqdirectory wherein 'seqdirectory' ignores
the 'PrefixFilter' argument.
While this should be fixed in Mahout 0.9, could u try modifying the following in classify-20newsgroups.sh

     echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow

to read as

   echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow -xm sequential


Please give that a try.





On Tuesday, November 12, 2013 5:57 PM, Fernando Santos <fernandoleandro1991@gmail.com>
wrote:
 
Hello everyone,

I have a configured a hadoop 1.2.1 single node cluster and installed mahout
0.8.

The node seems to be working correctly.

I'm trying to run the 20newsgroups mahout example on the hadoop cluster
running the cnaivebayes classifier. The problem is that I'm getting the
following error:

13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
{--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
--input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
--method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
--overwrite=null, --startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/20news-all
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

When i check the permissions of the folder I get this:
hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
/tmp/mahout-work-hduser/
total 14136
drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
-rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz

When I run the 20newsgroups choosing sgd classifier, it works correctly. I
think it's because it does not use map/reduce tasks so it is not even
running on hadoop.

Maybe it is something related to user access. I can run it with root user,
but I'm not sure if it runs correctly then. While it runs, I can't see any
map/reduce jobs going on on the jobTracker (
http://localhost:50030/jobtracker.jsp) so I think it might be running but
not in hadoop cluster, but locally instead.  Does it make sense? I actually
don't know if it should be showing the tasks running in this jobtracker
page..

Anyways, I'm trying to solve this for days, checked google a lot and didn't
find any help. Does anyone have any ideia?

PS: I'm totally new to hadoop and mahout.

-- 
Fernando Santos
+55 61 8129 8505
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message