> > So I am trying to build PCA. I was recommended in a previous thread that
> > was better that my data is available at the start as a distributed row
> > matrix. The work flow (already posted in a previous thread) would be:
> > 1. Get the data into distributed row matrix format.
> > 2. Compute empirical mean vector.
> Note that as we've mentioned in other threads, this step:
I know what you guys were saying in the previous thread. I believe I did
mention that since I would be working with image data that is overwhelming
dense meaning that even if I did do a subtract from mean, I would
essentially get a sparse matrix. In fact, running SVD separately on the
matrix and the low rank matrix (e*m') would probably in this case be a bad
idea because you would end up having to run the code on a dense matrix.
