mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Bickson <danny.bick...@gmail.com>
Subject Re: is it possible to compute the SVD for a large scale matrix
Date Wed, 06 Apr 2011 10:55:29 GMT
Did you try to increase the java allocation for child process
mapred.child.java.opts
(found in
conf/mapred-site.xml config file)?

Do you mean 60 Million by 60 Million?




On Wed, Apr 6, 2011 at 2:13 AM, Wei Li <wei.lee04@gmail.com> wrote:

> Hi Danny:
>
>      I have transformed the csv data into the DistributedRowMatrix format,
> but it still failed due to the memory problem after 2 or 3 iterations.
>
>      my matrix dimension is about 60w * 60w, it is possible to do the svd
> decomposition for this scale using Mahout?
>
> Best
> Wei
>
>
> On Sat, Mar 26, 2011 at 1:43 AM, Danny Bickson <danny.bickson@gmail.com>wrote:
>
>> Hi Wei,
>> You must verify you use SPARSE matrix and not dense, or else you will
>> surely get out of memory.
>> Take a look at this example:
>> http://bickson.blogspot.com/2011/02/mahout-svd-matrix-factorization.html
>> On how to prepare the input.
>>
>> Best,
>>
>> Danny Bickson
>>
>>
>> On Fri, Mar 25, 2011 at 1:33 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>wrote:
>>
>>> Wei,
>>>
>>> 1) i think DenseMatrix is a RAM-only representation. Naturally, you
>>> get OOM because it all has to fit in memory. If you want to run
>>> RAM-only SVD computation, you perhaps don't need Mahout. If you want
>>> to run distributed SVD computations, you need to prepare your data in
>>> what is called DistributedRowMatrix format. This is a sequence file
>>> with keys being whatever key you need to identify your rows, and
>>> values being VectorWritable wrapping either of vector implementations
>>> found in mahout (Dense, sparse sequenctial, sparse random).
>>> 2) Once you've prepared your data in DRM format, you can run either of
>>> SVD algorithms found in Mahout. It can be Lanczos solver ('mahout svd
>>> ... ") or, on the trunk you can also find a stochastic svd method
>>> ('mahout ssvd ...") which is issue MAHOUT-593 i mentioned earlier.
>>>
>>> Either way, I am not sure why you want DenseMatrix unless you want to
>>> use RAM-only Colt SVD solver -- but you certainly don't have to focus
>>> on Mahout implementation of one if you just want a RAM solver.
>>>
>>> -d
>>>
>>> On Fri, Mar 25, 2011 at 3:25 AM, Wei Li <wei.lee04@gmail.com> wrote:
>>> >
>>> > Actually, I would like to perform the spectral clustering on a large
>>> scale
>>> > sparse matrix, but it failed due to the OutOfMemory error when creating
>>> the
>>> > DenseMatrix for SVD decomposition.
>>> >
>>> > Best
>>> > Wei
>>> >
>>> > On Fri, Mar 25, 2011 at 4:05 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
>>> wrote:
>>> >>
>>> >> SSVD != Lanczos. if you do PCA or LSI it is perhaps what you need. it
>>> >> can take on these things. Well at least some of my branches can, if
>>> >> not the official patch.
>>> >>
>>> >> -d
>>> >>
>>> >> On Thu, Mar 24, 2011 at 11:09 PM, Wei Li <wei.lee04@gmail.com>
wrote:
>>> >> >
>>> >> > thanks for your reply
>>> >> >
>>> >> > my matrix is not very dense, a sparse matrix.
>>> >> >
>>> >> > I have tried the svd of Mahout, but failed due to the OutOfMemory
>>> error.
>>> >> >
>>> >> > Best
>>> >> > Wei
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Fri, Mar 25, 2011 at 2:03 PM, Dmitriy Lyubimov <
>>> dlieu.7@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> you can certainly try to write it out into a DRM (distributed
row
>>> >> >> matrix) and run stochastic SVD on  hadoop (off the trunk now).
see
>>> >> >> MAHOUT-593. This is suitable if you have a good decay of singular
>>> >> >> values (but if you don't it probably just means you have so
much
>>> noise
>>> >> >> that it masks the problem you are trying to solve in your data).
>>> >> >>
>>> >> >> Current committed solution is not most efficient yet, but it
should
>>> be
>>> >> >> quite capable.
>>> >> >>
>>> >> >> If you do, let me know how it went.
>>> >> >>
>>> >> >> thanks.
>>> >> >> -d
>>> >> >>
>>> >> >> On Thu, Mar 24, 2011 at 10:59 PM, Dmitriy Lyubimov <
>>> dlieu.7@gmail.com>
>>> >> >> wrote:
>>> >> >> > Are you sure your matrix is dense?
>>> >> >> >
>>> >> >> > On Thu, Mar 24, 2011 at 9:59 PM, Wei Li <wei.lee04@gmail.com>
>>> wrote:
>>> >> >> >> Hi All:
>>> >> >> >>
>>> >> >> >>    is it possible to compute the SVD factorization
for a 600,000
>>> *
>>> >> >> >> 600,000
>>> >> >> >> matrix using Mahout?
>>> >> >> >>
>>> >> >> >>    I have got the OutOfMemory error when creating
the
>>> DenseMatrix.
>>> >> >> >>
>>> >> >> >> Best
>>> >> >> >> Wei
>>> >> >> >>
>>> >> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message