mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lei tang <find.lt...@gmail.com>
Subject SSVD too slow to handle large matrix?
Date Fri, 14 Sep 2012 20:24:50 GMT
Hi,

I am using mahout's  SSVD (stochastic SVD) to factorize a huge sparse
matrix (around 30M x 1M).    I used a modified script of
http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html
to store the input matrix with <key, value> pairs being integer, and
vectorwritable (in particular, SequentialAccessSparseVector). Should I
change to RandomAccessSparseVector?

I managed to run mahout SSVD with the following specification.
mahout ssvd -Dmapred.max.split.size=1000000 -i mf/tr_full.seq -o
mf/out_full -k 200 -p 100 -r 100000 -U true -V true -t 20 --tempDir mf/tmp

I specified the max split in order to have more mappers.  However, the
first Qjob seems not moving. After 1 hour, it is still 12% with 100
mappers.  Is this expected?  Should I change any parameter?

Any suggestion is highly appreciated.

- Lei
P.S.  I'm also reading the docs from
https://issues.apache.org/jira/browse/MAHOUT-376  in hope that I can figure
out why it is so slow.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message