On Thu, May 5, 2011 at 12:22 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> On Thu, May 5, 2011 at 8:24 AM, Vckay <darkvckay@gmail.com> wrote:
>
> > So I am trying to build PCA. I was recommended in a previous thread that
> it
> > was better that my data is available at the start as a distributed row
> > matrix. The work flow (already posted in a previous thread) would be:
> > 1. Get the data into distributed row matrix format.
> > 2. Compute empirical mean vector.
> >
>
> Note that as we've mentioned in other threads, this step:
>
>
>
I know what you guys were saying in the previous thread. I believe I did
mention that since I would be working with image data that is overwhelming
dense meaning that even if I did do a subtract from mean, I would
essentially get a sparse matrix. In fact, running SVD separately on the
matrix and the low rank matrix (e*m') would probably in this case be a bad
idea because you would end up having to run the code on a dense matrix.
