Hi,
after reading this:
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
This looks familiar to LSA/LSI, but I have some questions:
In this example, the "tfidf-vectors"-matrix has 6,076,937 rows and
20,444 columns.
My first question is: Do the rows represent the documents and the
columns the terms in a traditional term-document-matrix?
so after the svd job, you got these 87 eigenvectors with each 20,444
columns (representing the terms).
These seem to be the eigenvectors of tfidf-vectors but reduced to only
87 documents? What is this mathematically?
and so, why do you calculate tfidf-vectors^T * svdOut^T? I do not find
myself an explanation
compared to SVD, is the result is the "right singular value"?
I know it works, but I don't understand some of these steps. Please help... :)
--
Stefan Wienert
http://www.wienert.cc
stefan@wienert.cc
Telefon: +495251-2026838 (neue Nummer seit 20.06.10)
Mobil: +49176-40170270