mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Bhat <akshayub...@gmail.com>
Subject Re: SVD Expectations
Date Sun, 29 Aug 2010 22:29:37 GMT
Even though the SVD is supposed to reduce dimensionality it does not means
that your results will have smaller size [in terms of memory], since U , S
and V are dense matrices. except if you are using too few eigenvectors. Your
input matrix is a sparse, had it been represented as a dense matrix it would
have far large size.


On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <gsingers@apache.org>wrote:

> Should be noted, that cranking the rank down to 20 produces a significantly
> smaller result.
>
>
> On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote:
>
> > I'm running SVD as:
> > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir
> /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200
> --numCols 65458 --numRows  130103
> >  ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput
> /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal
> --maxError 0.1 --minEigenvalue 10.0
> >
> > part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and
> largestCleanEigens is 88 MB.  For some reason, this really doesn't feel
> right.
> >
> > Is there a guide on interpreting the output of SVD anywhere?
>  Intuitively, I believe the output should be a lot smaller?   I mean that's
> the point, right?
> >
> > I can share the vector if you want.
> >
> > -Grant
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
> >
>
> --------------------------
> Grant Ingersoll
> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
>
>


-- 
Akshay Uday Bhat.
Graduate Student, Computer Science, Cornell University
Website: http://www.akshaybhat.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message