Of course, for a data set of only 1GB in size, you don't need to mapreduce
it. You can
use the regular sparse LanczosSolver in memory, and then you don't have to
worry
about this 10's of seconds of startup time.
On Wed, Apr 6, 2011 at 11:25 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> The key is the k passes. This bounds the time from below for large values
> of k since it typically takes 10's of seconds to light up a mapreduce job.
> Larger clusters can actually be worse for this computation because of that.
>
> On Wed, Apr 6, 2011 at 11:16 AM, Jake Mannix <jake.mannix@gmail.com>wrote:
>
>> ... Lanczosbased SVD, for k singular
>>
>> values, requires k passes over the data, and each row which has d nonzero
>> entries will do d^2 computations in each pass. ...
>>
>>
>> I guess "how long" depends on how big the cluster is!
>>
>
>
