mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Need a little help with SVD / Dimensional Reduction
Date Sun, 05 Jun 2011 20:16:41 GMT
Hi,

after reading this:
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction

This looks familiar to LSA/LSI, but I have some questions:

In this example, the "tfidf-vectors"-matrix has 6,076,937 rows and
20,444 columns.
My first question is: Do the rows represent the documents and the
columns the terms in a traditional term-document-matrix?

so after the svd job, you got these 87 eigenvectors with each 20,444
columns (representing the terms).
These seem to be the eigenvectors of tfidf-vectors but reduced to only
87 documents? What is this mathematically?

and so, why do you calculate tfidf-vectors^T * svdOut^T? I do not find
myself an explanation

compared to SVD, is the result is the "right singular value"?

I know it works, but I don't understand some of these steps. Please help... :)

-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838 (neue Nummer seit 20.06.10)
Mobil: +49176-40170270

Mime
View raw message