mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Danny Bickson <danny.bick...@gmail.com>
Subject Re: Need a little help with SVD / Dimensional Reduction
Date Mon, 06 Jun 2011 14:34:57 GMT
Hi
Mahout SVD implementation computes the Lanzcos iteration:
http://en.wikipedia.org/wiki/Lanczos_algorithm
Denote the non-square input matrix as M. First a symmetric matrix A is
computed by A=M'*M
Then an approximating tridiagonal matrix T and a vector matrix V are
computed such that A =~ V*T*V'
(this process is done in a distributed way).

Next the matrix T is next decomposed into eigenvectors and eignevalues.
Which is the returned result. (This process
is serial).

The third step makes the returned eigenvectors orthogonal to each other
(which is optional IMHO).

The heart of the code is found at:
./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
At least that is where it was in version 0.4 I am not sure if there are
changes in version 0.5

Anyway, Mahout does not compute directly SVD. If you are interested in
learning more about the relation to SVD
look at: http://en.wikipedia.org/wiki/Singular_value_decomposition,
subsection: relation to eigenvalue decomposition.

Hope this helps,

Danny Bickson

On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert <stefan@wienert.cc> wrote:

> After reading this thread:
>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTinQ5K4XrM7naBWn8qoBXZGVobBot2RtjZSV4yOd@mail.gmail.com%3E
>
> Wiki-SVD: M = U S V* (* = transposed)
>
> The output of Mahout-SVD is (U S) right?
>
> So... How do I get V from (U S)  and M?
>
> Is V = M (U S)* (because this is, what the calculation in the example is)?
>
> Thanks
> Stefan
>
> 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
> > https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
> >
> > What is done:
> >
> > Input:
> > tf-idf-matrix (docs x terms) 6076937 x 20444
> >
> > "SVD" of tf-idf-matrix (rank 100) produces the eigenvector (and
> > eigenvalues) of tf-idf-matrix, called:
> > svd (concepts x terms) 87 x 20444
> >
> > transpose tf-idf-matrix:
> > tf-idf-matrix-transpose (terms x docs) 20444 x 6076937
> >
> > transpose svd:
> > svd-transpose (terms x concepts) 20444 x 87
> >
> > matrix multiply:
> > tf-idf-matrix-transpose x svd-transpose = result
> > (terms x docs) x (terms x concepts) = (docs x concepts)
> >
> > so... I do understand, that the "svd" here is not SVD from wikipedia.
> > It only does the Lanczos algorithm and some magic which produces the
> >> Instead either the left or right (but usually the right) eigenvectors
> premultiplied by the diagonal or the square root of the
> >> diagonal element.
> > from
> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=Rta7tfRm8Zi60VcFya5xF+dbFrJ8pcds2N0-V@mail.gmail.com%3E
> >
> > so my question: what is the output of the SVD in mahout. And what do I
> > have to calculate to get the "right singular value" from svd?
> >
> > Thanks,
> > Stefan
> >
> > 2011/6/6 Stefan Wienert <stefan@wienert.cc>:
> >>
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
> >>
> >> the last step is the matrix multiplication:
> >>  --arg --numRowsA --arg 20444 \
> >>  --arg --numColsA --arg 6076937 \
> >>  --arg --numRowsB --arg 20444 \
> >>  --arg --numColsB --arg 87 \
> >> so the result is a 6,076,937 x 87 matrix
> >>
> >> the input has 6,076,937 (each with 20,444 terms). so the result of
> >> matrix multiplication has to be the right singular value regarding to
> >> the dimensions.
> >>
> >> so the result is the "concept-document vector matrix" (as I think,
> >> these is also called "document vectors" ?)
> >>
> >> 2011/6/6 Ted Dunning <ted.dunning@gmail.com>:
> >>> Yes.  These are term vectors, not document vectors.
> >>>
> >>> There is an additional step that can be run to produce document
> vectors.
> >>>
> >>> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert <stefan@wienert.cc>
> wrote:
> >>>
> >>>> compared to SVD, is the result is the "right singular value"?
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Stefan Wienert
> >>
> >> http://www.wienert.cc
> >> stefan@wienert.cc
> >>
> >> Telefon: +495251-2026838
> >> Mobil: +49176-40170270
> >>
> >
> >
> >
> > --
> > Stefan Wienert
> >
> > http://www.wienert.cc
> > stefan@wienert.cc
> >
> > Telefon: +495251-2026838
> > Mobil: +49176-40170270
> >
>
>
>
> --
> Stefan Wienert
>
> http://www.wienert.cc
> stefan@wienert.cc
>
> Telefon: +495251-2026838
> Mobil: +49176-40170270
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message