mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ahmed.nagy" <ahmed_said_n...@hotmail.com>
Subject Distributed Matrix Multiplication and operations
Date Mon, 10 Jan 2011 12:16:19 GMT

I am implementing a matrix factorisation technique for matrices that does not
fit in memory of a node. I have checked the documentation and the book
Mahout in Action for the distributed matrix operations DistributedRowMatrixI
need to carry out some distributed matrix operations. I have designed the
algorithm in that way.
Three matrices A B and C
Divide the matrix A into chunks
Divide C into chunks 
Map chunks of A, C and the matrix B 
Compute the updates 
Reduce Matrix C then compute Matrix B 
Repeat the above set of operations for Maxiterations
1-do I need to distribute the matrices on the cluster if I am carrying out
operations 
2-How can I control the amount of parallelism by the mappers for example.
3-When I used the constructor of the DistributedRowMatrix
DistributedRowMatrix m = new
DistributedRowMatrix("path/to/vector/sequenceFile", "tmp/path", 10000000,
250000);
from the example found on 

https://hudson.apache.org/hudson/job/Mahout-Quality/javadoc/org/apache/mahout/math/hadoop/DistributedRowMatrix.html#getOutputTempPath()

it gives The constructor DistributedRowMatrix(String, String, int, int) is
undefined 
I dug a bit and i found that the first two parameters are string and string
however i found that they should recieve a type Path that I tried to define
intialise like  that Path in=new Path("path/to/vector/sequenceFile");//
"path/to/vector/sequenceFile"
		Path out=new Path("/tmp/path");
then I passed in and out as parameters 
DistributedRowMatrix m = new DistributedRowMatrix(in,out, 10000000, 250000);
4-Another point is the  m.configure(new JobConf()); produces a warning of
deperciated JobConf.
5-Is  there anyside effect from using the deperciated JobConf.
6-Would anybody pinpoint me to how to package this job and run it on a
cluster
7-However I am not sure how to pass the sequence file when it is residing on
the HDFS.
Sorry if some of the questions might look naive.
I apperciate any insights.
Regards
Ahmed Nagy


-----
Ahmed Nagy
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Distributed-Matrix-Multiplication-and-operations-tp2226668p2226668.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Mime
View raw message