mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: is it possible to compute the SVD for a large scale matrix
Date Fri, 25 Mar 2011 17:33:12 GMT
Wei,

1) i think DenseMatrix is a RAM-only representation. Naturally, you
get OOM because it all has to fit in memory. If you want to run
RAM-only SVD computation, you perhaps don't need Mahout. If you want
to run distributed SVD computations, you need to prepare your data in
what is called DistributedRowMatrix format. This is a sequence file
with keys being whatever key you need to identify your rows, and
values being VectorWritable wrapping either of vector implementations
found in mahout (Dense, sparse sequenctial, sparse random).
2) Once you've prepared your data in DRM format, you can run either of
SVD algorithms found in Mahout. It can be Lanczos solver ('mahout svd
... ") or, on the trunk you can also find a stochastic svd method
('mahout ssvd ...") which is issue MAHOUT-593 i mentioned earlier.

Either way, I am not sure why you want DenseMatrix unless you want to
use RAM-only Colt SVD solver -- but you certainly don't have to focus
on Mahout implementation of one if you just want a RAM solver.

-d

On Fri, Mar 25, 2011 at 3:25 AM, Wei Li <wei.lee04@gmail.com> wrote:
>
> Actually, I would like to perform the spectral clustering on a large scale
> sparse matrix, but it failed due to the OutOfMemory error when creating the
> DenseMatrix for SVD decomposition.
>
> Best
> Wei
>
> On Fri, Mar 25, 2011 at 4:05 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>> SSVD != Lanczos. if you do PCA or LSI it is perhaps what you need. it
>> can take on these things. Well at least some of my branches can, if
>> not the official patch.
>>
>> -d
>>
>> On Thu, Mar 24, 2011 at 11:09 PM, Wei Li <wei.lee04@gmail.com> wrote:
>> >
>> > thanks for your reply
>> >
>> > my matrix is not very dense, a sparse matrix.
>> >
>> > I have tried the svd of Mahout, but failed due to the OutOfMemory error.
>> >
>> > Best
>> > Wei
>> >
>> >
>> >
>> > On Fri, Mar 25, 2011 at 2:03 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> > wrote:
>> >>
>> >> you can certainly try to write it out into a DRM (distributed row
>> >> matrix) and run stochastic SVD on  hadoop (off the trunk now). see
>> >> MAHOUT-593. This is suitable if you have a good decay of singular
>> >> values (but if you don't it probably just means you have so much noise
>> >> that it masks the problem you are trying to solve in your data).
>> >>
>> >> Current committed solution is not most efficient yet, but it should be
>> >> quite capable.
>> >>
>> >> If you do, let me know how it went.
>> >>
>> >> thanks.
>> >> -d
>> >>
>> >> On Thu, Mar 24, 2011 at 10:59 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>> >> wrote:
>> >> > Are you sure your matrix is dense?
>> >> >
>> >> > On Thu, Mar 24, 2011 at 9:59 PM, Wei Li <wei.lee04@gmail.com>
wrote:
>> >> >> Hi All:
>> >> >>
>> >> >>    is it possible to compute the SVD factorization for a 600,000
*
>> >> >> 600,000
>> >> >> matrix using Mahout?
>> >> >>
>> >> >>    I have got the OutOfMemory error when creating the DenseMatrix.
>> >> >>
>> >> >> Best
>> >> >> Wei
>> >> >>
>> >> >
>> >
>> >
>
>

Mime
View raw message