mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <pat.fer...@gmail.com>
Subject Re: SSVD error
Date Sat, 01 Sep 2012 14:32:28 GMT
I have a small data set that I am using in local mode for debugging purposes. The data is 57
crawled docs with something like 2200 terms. I run this through seq2sparse, then my own cloned
version of rowid to get a distributed row matrix, then into SSVD. I realize this is not a
production environment, but you need to debug somewhere and single threaded execution is ideal
for debugging. As I said this works in hadoop clustered mode.

The error looks like some code is expecting hdfs to be running, no? Here is the exception
stack from the ide with p = 20:

12/09/01 07:22:55 WARN mapred.LocalJobRunner: job_local_0002
java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/6590995089539988730_1587570556_37122331/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
does not exist.
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
	at org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:92)
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.setup(BtJob.java:219)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Bt job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.BtJob.run(BtJob.java:609)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:397)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
Disconnected from the target VM, address: '127.0.0.1:54588', transport: 'socket'
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)

Process finished with exit code 1

With p=100-200 I get the following:

12/09/01 07:30:33 ERROR common.IOUtils: new m can't be less than n
java.lang.IllegalArgumentException: new m can't be less than n
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
	at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
	at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
12/09/01 07:30:33 WARN mapred.LocalJobRunner: job_local_0001
java.lang.IllegalArgumentException: new m can't be less than n
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.GivensThinSolver.adjust(GivensThinSolver.java:109)
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.cleanup(QRFirstStep.java:233)
	at org.apache.mahout.math.hadoop.stochasticsvd.qr.QRFirstStep.close(QRFirstStep.java:89)
	at org.apache.mahout.common.IOUtils.close(IOUtils.java:128)
	at org.apache.mahout.math.hadoop.stochasticsvd.QJob$QMapper.cleanup(QJob.java:158)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
Exception in thread "main" java.io.IOException: Q job unsuccessful.
	at org.apache.mahout.math.hadoop.stochasticsvd.QJob.run(QJob.java:230)
	at org.apache.mahout.math.hadoop.stochasticsvd.SSVDSolver.run(SSVDSolver.java:376)
	at com.finderbots.analysis.AnalysisPipeline.SSVDTransformAndBack(AnalysisPipeline.java:257)
	at com.finderbots.analysis.AnalysisJob.run(AnalysisJob.java:20)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
	at com.finderbots.analysis.AnalysisJob.main(AnalysisJob.java:34)
Disconnected from the target VM, address: '127.0.0.1:54614', transport: 'socket'

Process finished with exit code 1




On Aug 31, 2012, at 4:21 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

Perhaps if you give more info about the stack etc. i might get a
better idea though

On Fri, Aug 31, 2012 at 4:19 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> I am not sure, i haven't used it that way.
> 
> I know it works fully distributed AND when embedded with local job
> tracker (e.g. its tests are basically MR jobs with "local" job
> tracker) which probably is not the same as Mahout local mode.  "local"
> job tracker is not good for much though: thus it doesn't use even
> multicore parallelism as it doesn't support multiple reducers, so this
> code is kind of for a real cluster really, pragmatically. There's also
> Ted's implementation of non-distributed SSVD in Mahout which does not
> require Hadoop dependencies but it is a different api with no PCA
> option (not sure about power iterations).
> 
> I am not sure why this very particular error appears in your setup.
> 
> On Fri, Aug 31, 2012 at 3:02 PM, Pat Ferrel <pat.ferrel@gmail.com> wrote:
>> Running on the local file system inside IDEA with MAHOUT_LOCAL set and performing
an SSVD I get the error below. Notice that R-m-00000 exists in the local file system and running
it outside the debugger in pseudo-cluster mode with HDFS works. Does SSVD work in local mode?
>> 
>> java.io.FileNotFoundException: File /tmp/hadoop-pat/mapred/local/archive/5543644668644532045_1587570556_2120541978/file/Users/pat/Projects/big-data/b/ssvd/Q-job/R-m-00000
does not exist.
>> 
>> Maclaurin:big-data pat$ ls -al b/ssvd/Q-job/
>> total 72
>> drwxr-xr-x  10 pat  staff   340 Aug 31 13:35 .
>> drwxr-xr-x   4 pat  staff   136 Aug 31 13:35 ..
>> -rw-r--r--   1 pat  staff    80 Aug 31 13:35 .QHat-m-00000.crc
>> -rw-r--r--   1 pat  staff    28 Aug 31 13:35 .R-m-00000.crc
>> -rw-r--r--   1 pat  staff     8 Aug 31 13:35 ._SUCCESS.crc
>> -rw-r--r--   1 pat  staff    12 Aug 31 13:35 .part-m-00000.deflate.crc
>> -rwxrwxrwx   1 pat  staff  9154 Aug 31 13:35 QHat-m-00000
>> -rwxrwxrwx   1 pat  staff  2061 Aug 31 13:35 R-m-00000
>> -rwxrwxrwx   1 pat  staff     0 Aug 31 13:35 _SUCCESS
>> -rwxrwxrwx   1 pat  staff     8 Aug 31 13:35 part-m-00000.deflate
>> 


Mime
View raw message