mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: SVD and input args
Date Mon, 05 Jul 2010 23:46:27 GMT

On Jul 5, 2010, at 7:14 PM, Ted Dunning wrote:

> On Mon, Jul 5, 2010 at 2:59 PM, Grant Ingersoll <> wrote:
>> Trying out SVD for the first time and trying to make sense of the
>> parameters...
>> Am I missing a more obvious way to get the number of rows to give to SVD
>> than to iterate through the whole sequence file of vectors and count them
>> up?
> Pretty much.  But you can also integrate that task into the production of
> the vectors.
>> Assuming a sufficiently large vector file, don't I need a M/R job to do
>> this?  Likewise, one would have to do this for the --numCols as well, right?
>> In reality, I suppose it would be useful to have a utility that checked to
>> make sure all the vectors in a file are the same cardinality, right?
> Yes and no.  The number of rows should be the number of documents you
> vectorized.  The number of columns should be the number of distinct terms
> that you observed in vectorizing.  Both should be pretty easily available.

Yeah, I can count the rows w/ the VectorDumper, but that doesn't really scale.  Just wondering
if I was missing some tool that people are using.

View raw message