mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SVD Expectations
Date Sun, 29 Aug 2010 21:13:24 GMT
Should be noted, that cranking the rank down to 20 produces a significantly smaller result.


On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote:

> I'm running SVD as:
> ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir /tmp/solr-clust-n2/svdTemp
--output /tmp/solr-clust-n2/svdOut --rank 200 --numCols 65458 --numRows  130103
>  ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput /tmp/solr-clust-n2/part-out.vec
--output /tmp/solr-clust-n2/svdFinal --maxError 0.1 --minEigenvalue 10.0
> 
> part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and largestCleanEigens
is 88 MB.  For some reason, this really doesn't feel right.
> 
> Is there a guide on interpreting the output of SVD anywhere?  Intuitively, I believe
the output should be a lot smaller?   I mean that's the point, right?  
> 
> I can share the vector if you want.
> 
> -Grant
> 
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
> 

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message