mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: "Not a file" issue with TwentyNewsGroups
Date Wed, 07 Apr 2010 00:06:12 GMT
I am assuming that you *didn't* convert the 20newsgroups into the required
format which resulted in this error. Is my guess right?

Robin

On Wed, Apr 7, 2010 at 3:29 AM, Grant Ingersoll <gsingers@apache.org> wrote:

> What are the commands you are running?
>
> On Apr 5, 2010, at 9:59 AM, Adam Hammer wrote:
>
> > Hello all,
> >
> > I am just starting out with Mahout, and to get my feet wet I am running
> > through the TwentyNewsGroups example.  I have successfully configured a
> > single node Hadoop system as well as a pseudo-distributed Hadoop system
> on
> > two separate machines.  On both environments, I have gone through the
> guide
> > successfully to put all the news inputs into the folder 20news-input.  I
> am
> > able to successfully ls and cat the files in the directory.
> >
> > However, when I go to run the TrainClassifier, I am getting the following
> > message:
> >
> > 10/04/05 09:48:33 INFO bayes.TrainClassifier: Training Complementary
> Bayes
> > Classifier
> > 10/04/05 09:48:33 INFO cbayes.CBayesDriver: Reading features...
> > 10/04/05 09:48:33 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 10/04/05 09:48:33 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 19
> > Exception in thread "main" java.io.IOException: Not a file:
> > hdfs://localhost:9000/user/bob/20news-input/comp.graphics
> >    at
> >
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:206)
> >    at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >    at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
> >    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> >    at
> >
> org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:75)
> >    at
> >
> org.apache.mahout.classifier.bayes.mapreduce.cbayes.CBayesDriver.runJob(CBayesDriver.java:61)
> >    at
> >
> org.apache.mahout.classifier.bayes.TrainClassifier.trainCNaiveBayes(TrainClassifier.java:56)
> >    at
> >
> org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:128)
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> >
> > I get this error on both the single node system I have setup, as well as
> the
> > separate dual-node system.  As I said before, I am able to cat and ls
> that
> > directory and the files in it perfectly fine.  Any thoughts?
> >
> > Thanks!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message