mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SVD and input args
Date Mon, 05 Jul 2010 23:46:27 GMT

On Jul 5, 2010, at 7:14 PM, Ted Dunning wrote:

> On Mon, Jul 5, 2010 at 2:59 PM, Grant Ingersoll <gsingers@apache.org> wrote:
> 
>> Trying out SVD for the first time and trying to make sense of the
>> parameters...
>> 
>> Am I missing a more obvious way to get the number of rows to give to SVD
>> than to iterate through the whole sequence file of vectors and count them
>> up?
> 
> 
> Pretty much.  But you can also integrate that task into the production of
> the vectors.
> 
> 
>> Assuming a sufficiently large vector file, don't I need a M/R job to do
>> this?  Likewise, one would have to do this for the --numCols as well, right?
>> In reality, I suppose it would be useful to have a utility that checked to
>> make sure all the vectors in a file are the same cardinality, right?
>> 
> 
> Yes and no.  The number of rows should be the number of documents you
> vectorized.  The number of columns should be the number of distinct terms
> that you observed in vectorizing.  Both should be pretty easily available.

Yeah, I can count the rows w/ the VectorDumper, but that doesn't really scale.  Just wondering
if I was missing some tool that people are using.


Mime
View raw message