mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SVD and input args
Date Tue, 06 Jul 2010 01:07:35 GMT
It scales better than producing the vectors does!

Seriously, whatever is producing the vectors can easily produce counts, even
if there are many counts.  The SVD driver code can read and summarize many,
many counts in essentially zero time.

On Mon, Jul 5, 2010 at 4:46 PM, Grant Ingersoll <gsingers@apache.org> wrote:

> > Yes and no.  The number of rows should be the number of documents you
> > vectorized.  The number of columns should be the number of distinct terms
> > that you observed in vectorizing.  Both should be pretty easily
> available.
>
> Yeah, I can count the rows w/ the VectorDumper, but that doesn't really
> scale.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message