mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Computing SVD Of "Large Sparse Data"
Date Sat, 04 Jun 2011 01:26:00 GMT
What you really probably need to worry is not the number of
dimensions, but only avg number of non-zero elements per row
(density). How dense is the data?



On Fri, Jun 3, 2011 at 4:48 PM, Eshwaran Vijaya Kumar
<evijayakumar@mozilla.com> wrote:
> Hello all,
>  We are trying to build a clustering system which will have an SVD component. I believe
Mahout has two SVD solvers: DistributedLanczosSolver and SSVD. Could someone give me some
tips on which would be a better choice of a solver given that the size of the data will be
roughly 100 million rows with each row having roughly 50 K dimensions (100 million X 50000
). We will be working with text data so the resultant matrix should be relatively sparse to
begin with.
>
> Thanks
> Eshwaran

Mime
View raw message