mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hokamp <chris.hok...@gmail.com>
Subject Re: LSI using Mahout ssvd - folding a new doc into the space
Date Fri, 29 Jun 2012 22:13:35 GMT
Thanks for the quick response. So I will create a new diagonal matrix with
the reciprocals of the eigenvalues, and multiply by that. I took a look at
the slides (very nice presentation!), but it seems that I won't even need
to go this far, as I should be able to take E^(-1) x U^(T) x docvector, and
U is available from the output of ssvd. I'm basing this assumption on pages
2/3 of [1].

Thanks again for the help,
Chris

[1]
https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.data/SSVD-CLI.pdf

On Fri, Jun 29, 2012 at 4:31 PM, Sean Owen <srowen@gmail.com> wrote:

> Well the inverse of a diagonal matrix like that is just going to be a
> diagonal matrix holding the reciprocals (1/x) of the values. That much
> is easy. But you need to invert more than that to fold in.
>
> I admit even I don't know the details of the Mahout implementation
> you're using, but I imagine the overall principle is the same as the
> fold-in described in ... oh wait, look at that, in a preso I posted a
> while ago: http://www.slideshare.net/srowen/matrix-factorization  Look
> at the last few slides; I think it's kind of a useful / simple way to
> think of it.
>
> Sean
>
> On Fri, Jun 29, 2012 at 10:27 PM, Chris Hokamp <chris.hokamp@gmail.com>
> wrote:
> > Hi all,
> >
> > I'm trying to implement Latent Semantic Indexing using the mahout ssvd
> > tool, and I'm having trouble understanding how I can use the output of
> ssvd
> > Mahout to 'fold' new queries (documents) into the LSI space.
> Specifically,
> > I can't find a way to multiply a vector representing a query by the
> inverse
> > of the matrix of singular values - I can't find a way to solve for the
> > inverse of the diagonal matrix of singular values.
> >
> > I can generate the output matrices using ssvd, and compare document/term
> > vectors using cosine similarity, but I'm stumped when it comes to
> folding a
> > new document into the space.
> >
> > Any thoughts or guidance would be appreciated.
> >
> > Cheers,
> > Chris
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message