mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Re: Need a little help with SVD / Dimensional Reduction
Date Mon, 06 Jun 2011 12:22:01 GMT
https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction

What is done:

Input:
tf-idf-matrix (docs x terms) 6076937 x 20444

"SVD" of tf-idf-matrix (rank 100) produces the eigenvector (and
eigenvalues) of tf-idf-matrix, called:
svd (concepts x terms) 87 x 20444

transpose tf-idf-matrix:
tf-idf-matrix-transpose (terms x docs) 20444 x 6076937

transpose svd:
svd-transpose (terms x concepts) 20444 x 87

matrix multiply:
tf-idf-matrix-transpose x svd-transpose = result
(terms x docs) x (terms x concepts) = (docs x concepts)

so... I do understand, that the "svd" here is not SVD from wikipedia.
It only does the Lanczos algorithm and some magic which produces the
> Instead either the left or right (but usually the right) eigenvectors premultiplied by
the diagonal or the square root of the
> diagonal element.
from http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=Rta7tfRm8Zi60VcFya5xF+dbFrJ8pcds2N0-V@mail.gmail.com%3E

so my question: what is the output of the SVD in mahout. And what do I
have to calculate to get the "right singular value" from svd?

Thanks,
Stefan

2011/6/6 Stefan Wienert <stefan@wienert.cc>:
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>
> the last step is the matrix multiplication:
>  --arg --numRowsA --arg 20444 \
>  --arg --numColsA --arg 6076937 \
>  --arg --numRowsB --arg 20444 \
>  --arg --numColsB --arg 87 \
> so the result is a 6,076,937 x 87 matrix
>
> the input has 6,076,937 (each with 20,444 terms). so the result of
> matrix multiplication has to be the right singular value regarding to
> the dimensions.
>
> so the result is the "concept-document vector matrix" (as I think,
> these is also called "document vectors" ?)
>
> 2011/6/6 Ted Dunning <ted.dunning@gmail.com>:
>> Yes.  These are term vectors, not document vectors.
>>
>> There is an additional step that can be run to produce document vectors.
>>
>> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert <stefan@wienert.cc> wrote:
>>
>>> compared to SVD, is the result is the "right singular value"?
>>>
>>
>
>
>
> --
> Stefan Wienert
>
> http://www.wienert.cc
> stefan@wienert.cc
>
> Telefon: +495251-2026838
> Mobil: +49176-40170270
>



-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838
Mobil: +49176-40170270

Mime
View raw message