mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Santos <fernandoleandro1...@gmail.com>
Subject Check if mahout is indeed running on Hadoop
Date Tue, 12 Nov 2013 22:56:50 GMT
Hello everyone,

I have a configured a hadoop 1.2.1 single node cluster and installed mahout
0.8.

The node seems to be working correctly.

I'm trying to run the 20newsgroups mahout example on the hadoop cluster
running the cnaivebayes classifier. The problem is that I'm getting the
following error:

13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
{--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
--input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
--method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
--overwrite=null, --startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/20news-all
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

When i check the permissions of the folder I get this:
hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
/tmp/mahout-work-hduser/
total 14136
drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
-rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz

When I run the 20newsgroups choosing sgd classifier, it works correctly. I
think it's because it does not use map/reduce tasks so it is not even
running on hadoop.

Maybe it is something related to user access. I can run it with root user,
but I'm not sure if it runs correctly then. While it runs, I can't see any
map/reduce jobs going on on the jobTracker (
http://localhost:50030/jobtracker.jsp) so I think it might be running but
not in hadoop cluster, but locally instead.  Does it make sense? I actually
don't know if it should be showing the tasks running in this jobtracker
page..

Anyways, I'm trying to solve this for days, checked google a lot and didn't
find any help. Does anyone have any ideia?

PS: I'm totally new to hadoop and mahout.

-- 
Fernando Santos
+55 61 8129 8505

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message