mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Santos <fernandoleandro1...@gmail.com>
Subject Re: Check if mahout is indeed running on Hadoop
Date Wed, 13 Nov 2013 19:59:59 GMT
Hello Suneel,

Thank you for the tip. It was indeed the bug, and adding "-xm sequential"
solved this problem. But then I got a similar error while testing the
classifier (./bin/mahout testnb). Seems to be again an error abut
permissions. Maybe another bug? =P

13/11/13 17:39:21 WARN driver.MahoutDriver: No testnb.props found on
classpath, will use command-line arguments only
13/11/13 17:39:21 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647],
--input=[/tmp/mahout-work-hduser/20news-train-vectors],
--labelIndex=[/tmp/mahout-work-hduser/labelindex],
--model=[/tmp/mahout-work-hduser/model],
--output=[/tmp/mahout-work-hduser/20news-testing], --overwrite=null,
--startPhase=[0], --tempDir=[temp], --testComplementary=null}
13/11/13 17:39:22 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201311131709_0036
13/11/13 17:39:22 ERROR security.UserGroupInformation:
PriviledgedActionException as:hduser cause:java.io.FileNotFoundException:
File does not exist: /tmp/mahout-work-hduser/model
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/model
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
 at
org.apache.hadoop.filecache.DistributedCache.getFileStatus(DistributedCache.java:185)
at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:723)
 at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:792)
 at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.determineTimestampsAndCacheVisibilities(TrackerDistributedCacheManager.java:755)
 at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:843)
at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:734)
 at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.runMapReduce(TestNaiveBayesDriver.java:141)
 at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:109)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:66)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)


I checked the working directory, and indeed the model folder wasn't
created. So I think the problem is that it is not generating this /model
folder.

hduser@fernandoPC:/usr/local/mahout$ ls /tmp/mahout-work-hduser/
20news-all  20news-bydate  20news-bydate.tar.gz

hduser@fernandoPC:/usr/local/mahout$ ls -l /tmp
drwxr-xr-x  4 hduser   hadoop      4096 Nov 13 17:35 mahout-work-hduser


Also, while training the classifier, three jobs failed due to these
exceptions. Don't know if they were relevant for the error or not:

13/11/13 17:38:58 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:03 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:09 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:16 INFO mapred.JobClient: Job complete: job_201311131709_0035



Any ideias?

Thanks!


2013/11/12 Suneel Marthi <suneel_marthi@yahoo.com>

> Hi Fernando,
>
> This could be related to a Bug (see MAHOUT-1319) in seqdirectory wherein
> 'seqdirectory' ignores the 'PrefixFilter' argument.
> While this should be fixed in Mahout 0.9, could u try modifying the
> following in classify-20newsgroups.sh
>
>      echo "Creating sequence files from 20newsgroups data"
>   ./bin/mahout seqdirectory \
>     -i ${WORK_DIR}/20news-all \
>     -o ${WORK_DIR}/20news-seq -ow
>
> to read as
>
>    echo "Creating sequence files from 20newsgroups data"
>   ./bin/mahout seqdirectory \
>     -i ${WORK_DIR}/20news-all \
>     -o ${WORK_DIR}/20news-seq -ow -xm sequential
>
>
> Please give that a try.
>
>
>
>
>
> On Tuesday, November 12, 2013 5:57 PM, Fernando Santos <
> fernandoleandro1991@gmail.com> wrote:
>
> Hello everyone,
>
> I have a configured a hadoop 1.2.1 single node cluster and installed mahout
> 0.8.
>
> The node seems to be working correctly.
>
> I'm trying to run the 20newsgroups mahout example on the hadoop cluster
> running the cnaivebayes classifier. The problem is that I'm getting the
> following error:
>
> 13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
> {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
> --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
> --input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
> --method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /tmp/mahout-work-hduser/20news-all
>     at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
> When i check the permissions of the folder I get this:
> hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
> /tmp/mahout-work-hduser/
> total 14136
> drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
> drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
> -rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz
>
> When I run the 20newsgroups choosing sgd classifier, it works correctly. I
> think it's because it does not use map/reduce tasks so it is not even
> running on hadoop.
>
> Maybe it is something related to user access. I can run it with root user,
> but I'm not sure if it runs correctly then. While it runs, I can't see any
> map/reduce jobs going on on the jobTracker (
> http://localhost:50030/jobtracker.jsp) so I think it might be running but
> not in hadoop cluster, but locally instead.  Does it make sense? I actually
> don't know if it should be showing the tasks running in this jobtracker
> page..
>
> Anyways, I'm trying to solve this for days, checked google a lot and didn't
> find any help. Does anyone have any ideia?
>
> PS: I'm totally new to hadoop and mahout.
>
> --
> Fernando Santos
> +55 61 8129 8505
>



-- 
Fernando Santos
+55 61 8129 8505

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message