mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter M. Goldstein" <peter_m_goldst...@yahoo.com>
Subject RE: DistributedRowMatrix.transpose().transpose() = Exception
Date Wed, 07 Jul 2010 16:54:25 GMT
Hi Laszlo,

The exception message:

org.apache.mahout.math.IndexException: Index 31 is outside allowable range of [0,31]

is a little misleading, as it should actually read "allowable range of [0,30]" for this case,
as the index is required to be strictly less than the size of the vector.  So somehow the
column index is being populated at 31 (and possibly higher values) during the second transpose.

To me this suggests that there may be an off-by-one error introduced into the index during
the transpose process.  This might not cause an error in the original transpose if the input
data has zeroes in the right place.

So a few quick questions:

i) What's the structure of the input matrix?  
ii) Have you confirmed that the output of a single transpose is correct?

--Peter

-----Original Message-----
From: Laszlo Dosa [mailto:laszlo.dosa@fredhopper.com] 
Sent: Wednesday, July 07, 2010 6:39 AM
To: user@mahout.apache.org
Subject: DistributedRowMatrix.transpose().transpose() = Exception

Hi,

As far as I know if I transpose a matrix twice I should get back the original matrix.

I tried to do this with DistributedRowMatrix (trunk version).  My sample matrix has 14 rows
and 31 columns.

I got the following exception:
org.apache.mahout.math.IndexException: Index 31 is outside allowable range of [0,31]
                at org.apache.mahout.math.AbstractVector.set(AbstractVector.java:324)
                at org.apache.mahout.math.SequentialAccessSparseVector.<init>(SequentialAccessSparseVector.java:69)
                at org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:144)
                at org.apache.mahout.math.hadoop.TransposeJob$TransposeReducer.reduce(TransposeJob.java:1)
                at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
                at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
                at org.apache.hadoop.mapred.Child.main(Child.java:170)

Exception in thread "main" java.io.IOException: Job failed!
                at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1293)
                at org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:153)
                at com.fredhopper.MatrixTransposeJob.run(MatrixTransposeJob.java:46)
                at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
                at com.fredhopper.MatrixTransposeJob.main(MatrixTransposeJob.java:52)
                at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
                at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
                at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
                at java.lang.reflect.Method.invoke(Method.java:597)
                at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Does anyone had the same issues or knows how to sole it?

Thanks,
Laszlo

I runned:
hadoop jar matrix-transpose.jar \
com.fredhopper.MatrixTransposeJob \
-i input/ \
-o output/ \
--numRows 14 \
--numCols 31

My code is:
package com.fredhopper;

import java.io.IOException;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.ToolRunner;
import org.apache.mahout.common.AbstractJob;
import org.apache.mahout.math.hadoop.DistributedRowMatrix;

public class MatrixTransposeJob extends AbstractJob {

                @SuppressWarnings("deprecation")
                @Override
                public int run(String[] args) throws IOException, ClassNotFoundException,
InterruptedException {
                               addInputOption();
                               addOutputOption();

                               addOption("numRows", "nr", "Number of rows of the input matrix");
                               addOption("numCols", "nc", "Number of columns of the input
matrix");

                               Configuration originalConfig = getConf();

                               Map<String,String> parsedArgs = parseArguments(args);
                               if (parsedArgs == null) {
                                               return -1;
                               }

                               Path inputPath = getInputPath();
                               Path outputPath =  getOutputPath();

                               int numRows = Integer.parseInt(parsedArgs.get("--numRows"));
                               int numCols = Integer.parseInt(parsedArgs.get("--numCols"));

                               DistributedRowMatrix matrix = new DistributedRowMatrix(inputPath,
                                                               outputPath,
                                                               numRows,
                                                               numCols);
                               
                               JobConf conf = new JobConf(originalConfig);
                               matrix.configure(conf);

                               DistributedRowMatrix t1 = matrix.transpose();
                               DistributedRowMatrix t2 = t1.transpose();

                               return 0;
                }

                public static void main(String[] args) throws Exception {
                               ToolRunner.run(new Configuration(), new MatrixTransposeJob(),
args);
                }

}





Mime
View raw message