mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD error
Date Sat, 01 Sep 2012 15:12:16 GMT
It seems it is running in local mode and is using local filesystem which is
fine. Unit test does exactly the same thing and runs fine. In this case a
step is looking for output of the previous step and raw file system says it
cannot find the file but you are saying file is there. So i dont know,
something apparently affects local file system implementation. Local mode
does not require hdfs or jobtracker access, should be able to run
completely isolated from extenal requirements aside from hadoop libraries
availability(but it looks like the latter is available in your setup).

I can also confirm that i never had any problem running local solver from
eclipse without hadoop running. Dont know about idea.

Second error is what we discussed before, if you request more dimenions
than the  input rank has, it will cause blocking deficiency. Or
pragmatically k+p should be <= min(m,n). Which translates to what Ted said.
At least, k+p <=47 in your case.
 On Sep 1, 2012 7:39 AM, "Pat Ferrel" <pat.ferrel@gmail.com> wrote:

> I have a small data set that I am using in local mode for debugging
> purposes. The data is 57 crawled docs with something like 2200 terms. I run
> this through seq2sparse, then my own cloned version of rowid to get a
> distributed row matrix, then into SSVD. I realize this is not a production
> environment, but you need to debug somewhere and single threaded execution
> is ideal for debugging. As I said this works in hadoop clustered mode.
>
> The error looks like some code is expecting hdfs to be running, no? Here
> is the exception stack from the ide with p = 20:
>
> 12/09/01 07:22:55 WARN mapred.LocalJobRunner: job_local_0002
> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
>         at
> org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Bt job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> Disconnected from the target VM, address: '127.0.0.1:54588', transport:
> 'socket'
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
>
> Process finished with exit code 1
>
> With p=100-200 I get the following:
>
> 12/09/01 07:30:33 ERROR common.IOUtils: new m can't be less than n
> java.lang.IllegalArgumentException: new m can't be less than n
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
>         at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/09/01 07:30:33 WARN mapred.LocalJobRunner: job_local_0001
> java.lang.IllegalArgumentException: new m can't be less than n
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
>         at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Exception in thread "main" java.io.IOException: Q job unsuccessful.
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.QJob.run(QJob.java:230)
>         at
> org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:376)
>         at
> com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
>         at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>         at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
> Disconnected from the target VM, address: '127.0.0.1:54614', transport:
> 'socket'
>
> Process finished with exit code 1
>
>
>
>
> On Aug 31, 2012, at 4:21 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>
> Perhaps if you give more info about the stack etc. i might get a
> better idea though
>
> On Fri, Aug 31, 2012 at 4:19 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
> wrote:
> > I am not sure, i haven't used it that way.
> >
> > I know it works fully distributed AND when embedded with local job
> > tracker (e.g. its tests are basically MR jobs with "local" job
> > tracker) which probably is not the same as Mahout local mode.  "local"
> > job tracker is not good for much though: thus it doesn't use even
> > multicore parallelism as it doesn't support multiple reducers, so this
> > code is kind of for a real cluster really, pragmatically. There's also
> > Ted's implementation of non-distributed SSVD in Mahout which does not
> > require Hadoop dependencies but it is a different api with no PCA
> > option (not sure about power iterations).
> >
> > I am not sure why this very particular error appears in your setup.
> >
> > On Fri, Aug 31, 2012 at 3:02 PM, Pat Ferrel <pat.ferrel@gmail.com>
> wrote:
> >> Running on the local file system inside IDEA with MAHOUT_LOCAL set and
> performing an SSVD I get the error below. Notice that R-m-00000 exists in
> the local file system and running it outside the debugger in pseudo-cluster
> mode with HDFS works. Does SSVD work in local mode?
> >>
> >> java.io.FileNotFoundException: File
> /tmp/hadoop-pat/mapred/local/archive/5543644668644532045_1587570556_2120541978/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
> does not exist.
> >>
> >> Maclaurin:big-data pat$ ls -al b/ssvd/Q-job/
> >> total 72
> >> drwxr-xr-x  10 pat  staff   340 Aug 31 13:35 .
> >> drwxr-xr-x   4 pat  staff   136 Aug 31 13:35 ..
> >> -rw-r--r--   1 pat  staff    80 Aug 31 13:35 .QHat-m-00000.crc
> >> -rw-r--r--   1 pat  staff    28 Aug 31 13:35 .R-m-00000.crc
> >> -rw-r--r--   1 pat  staff     8 Aug 31 13:35 ._SUCCESS.crc
> >> -rw-r--r--   1 pat  staff    12 Aug 31 13:35 .part-m-00000.deflate.crc
> >> -rwxrwxrwx   1 pat  staff  9154 Aug 31 13:35 QHat-m-00000
> >> -rwxrwxrwx   1 pat  staff  2061 Aug 31 13:35 R-m-00000
> >> -rwxrwxrwx   1 pat  staff     0 Aug 31 13:35 _SUCCESS
> >> -rwxrwxrwx   1 pat  staff     8 Aug 31 13:35 part-m-00000.deflate
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message