Thanks Ted,
On Sun, Sep 5, 2010 at 2:05 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> I don't think anybody has done anything on quite that scale, though Jake
> may
> have come relatively close.
>
> There are several scaling limits. These include:
>
>  the total number of nonzero elements. This drives the scan time and, to
> some extent the cost of the multiplies.
>
> The total number of nonzero elements are small since, most of the twitter
users follow on average around 100 other users
>  the total number of singular vectors desired. This directly drives the
> number of iterations in the Hebbian approach and drives the size of
> intermediate products in the random projection techniques. It also causes
> product scaling with the next factor.
>
I plant to calculate around 50200 singular vectors
>
>  the number of columns in the original matrix. This, multiplied by the
> number of singular vectors drives the memory cost of some approaches in the
> final step or in the SVD step for the random projection.
>
The number of columns in the matrix are ~ 47 million
>  the number of rows in the original matrix. This is a secondary factor
> that can drive some intermediate products in the random projection.
>
> The number of rows is around 35 Million
> Which of these will hang you up in your problem is an open question. There
> is always the factor I haven't thought about yet.
>
> Jake, do you have any thoughts on this?
>
>
I believe that the twitter data set would be good stress test for the SVD
algorithm.
I should hopefully get access to cluster by next week.
On Sat, Sep 4, 2010 at 5:08 PM, Akshay Bhat <akshayubhat@gmail.com> wrote:
>
> > Hello,
> > Has anyone attempted SVD of a with a really large matrix (~40 million
> rows
> > and columns to be specific) using mahout.
> > I am planning to perform SVD using mahout on Twitter Follower network (it
> > contains information about ~35 Million users following ~45 million users
> > http://an.kaist.ac.kr/traces/WWW2010.html ) and I should have access to
> > Cornell hadoop cluster (55 Quad core nodes with 1618GB ram per node).
> Can
> > anyone estimate how long the job will run?
> > Also is it possible to perform regularized SVD, or will I need to add
> > functionality by modifying the code.
> > Thank you
> >
> >
> > 
> > Akshay Uday Bhat.
> > Graduate Student, Computer Science, Cornell University
> > Website: http://www.akshaybhat.com
> >
>
Thanks

Akshay Uday Bhat.
Graduate Student, Computer Science, Cornell University
Website: http://www.akshaybhat.com
