mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vckay <>
Subject Re: Question Regarding Distributed Row Matrix
Date Thu, 05 May 2011 18:05:56 GMT
On Thu, May 5, 2011 at 12:22 PM, Jake Mannix <> wrote:

> On Thu, May 5, 2011 at 8:24 AM, Vckay <> wrote:
> > So I am trying to build PCA. I was recommended in a previous thread that
> it
> > was better that my data is available at the start as a distributed row
> > matrix. The work flow (already posted in a previous thread) would be:
> > 1. Get the data into distributed row matrix format.
> > 2. Compute empirical mean vector.
> >
> Note that as we've mentioned in other threads, this step:
I know what you guys were saying in the previous thread. I believe I did
mention that since I would be working with image data that is overwhelming
dense meaning that even if I did do a subtract from mean, I would
essentially get a sparse matrix. In fact, running SVD separately on the
matrix and the low rank matrix (e*m') would probably in this case be a bad
idea because you would end up having to run the code on a dense matrix.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message