mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: SSVD on very large and sparse matrices
Date Fri, 13 Dec 2013 21:17:33 GMT
On Fri, Dec 13, 2013 at 12:42 PM, Ron Ayoub <ronaldayoub@live.com> wrote:

> I'm doing some up front research on implementing LSI and choice of tools.
> I understand Mahout provide an out-of-core implementation of Stochastic
> SVD. On the web site it use the words 'reasonable size problems'. Would a
> spare matrix 1,000,000 * 1,000,000 having some 250,000,000 nonzero entries
> be out of the question.


for performance/accuracy assessment Nathan's dissertation [1] pp. 139 and
on is so far the best source I know.

Nathan compares performance and assesses bottlenecks on at least two
interesting data sets -- wiki and wiki-max. He is experience the bottleneck
in the matrix multiplication operation (but he may have done the testing
before certain improvements were applied to the matrix-matrix part of power
iterations -- i am still hazy on that).

[1]
http://amath.colorado.edu/faculty/martinss/Pubs/2012_halko_dissertation.pdf

I have a great hope that this bottleneck could be further addressed by
punting MapReduce out of equation and replacing with Bagel or GraphX
broadcast operations in the upcoming Spark 0.9. I have plans to address
that with Mahout-on-Spark part of the code but I am still waiting for Spark
project to rehash its graph based computation approach (there's sense that
GraphX should be superior in broadcasting techniques than existing Bagel
api in Spark).


> If so, what tools out there can do that. For instance, ARPACK.


AFAIK nobody to date cared to do the comparisons with ARPACK


> Regardless, how does Mahout SSVD compare to ARPACK. These seems to be the
> options out there that I have found. Thanks.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message