On Tue, Dec 8, 2009 at 7:27 PM, Zhengguo 'Mike' SUN
<zhengguosun@yahoo.com>wrote:
>
> No, I am not computing entries in XWH. I am doing iterations on the
> following two rules:
>
> H < H*(WtX)/(WtWH), W< W*(XHt)/(WHHt)
>
>
Hmm... I haven't thought about this for much more than this today, but let's
see...
So if your X matrix is N by M, with each of the N rows having an average of
d nonempty
values, and you want a rankk factorization, producing W'WH and WHH' is
going to take
something like O(k^2 * (N+M)) flops each, right?
And you need to iterate this again and again until convergence? How many
iterations does this
tend to take? It seems like this algorithm might not scale terribly well on
humongous data sets,
at least in comparison to what I know of the speed of e.g. SVD via Lanczos
or aGHA.
It also seems like this isn't going to respect HDFS locality very much,
because the full W and
H matrices are going to need to get moved around a lot, and continually
updated (ie replaced)
with newer values. Maybe there are some tricks which can make this run
more efficiently,
but I would caution you to think carefully through how you parallelize this
 the most straightforward
approach seems like it is going to entail a *lot* of matrix multiplications.
jake
