mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Han Fan <visaya...@gmail.com>
Subject Mahout DistributedRowMatrix run with only one mapper
Date Tue, 17 Jun 2014 07:05:24 GMT
I have a 6kx10k matrix T and I need the result of T'*T which should be 
10kx10k. I want to do this using Mahout DistributedRowMatrix but I found 
Hadoop caculates with only one mapper which is very slow.

I digged into the source code of DistributedRowMatrix and found that the 
input format of DistributedRowMatrix  is CompositeInputFormat.class 
which has a method named getSplits that set mapred.min.split.size to 
Long.MAX_VALUE.

So my question is that is DistributedRowMatrix only a demo to show that 
matrix multiplication could be done using MapReduce but has no practical 
value? Is there any way to do matrix multiplication quickly using Hadoop?

Thanks for your time and sorry for my broken English.


Mime
View raw message