mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hokamp <>
Subject Re: LSI using Mahout ssvd - folding a new doc into the space
Date Fri, 29 Jun 2012 22:13:35 GMT
Thanks for the quick response. So I will create a new diagonal matrix with
the reciprocals of the eigenvalues, and multiply by that. I took a look at
the slides (very nice presentation!), but it seems that I won't even need
to go this far, as I should be able to take E^(-1) x U^(T) x docvector, and
U is available from the output of ssvd. I'm basing this assumption on pages
2/3 of [1].

Thanks again for the help,


On Fri, Jun 29, 2012 at 4:31 PM, Sean Owen <> wrote:

> Well the inverse of a diagonal matrix like that is just going to be a
> diagonal matrix holding the reciprocals (1/x) of the values. That much
> is easy. But you need to invert more than that to fold in.
> I admit even I don't know the details of the Mahout implementation
> you're using, but I imagine the overall principle is the same as the
> fold-in described in ... oh wait, look at that, in a preso I posted a
> while ago:  Look
> at the last few slides; I think it's kind of a useful / simple way to
> think of it.
> Sean
> On Fri, Jun 29, 2012 at 10:27 PM, Chris Hokamp <>
> wrote:
> > Hi all,
> >
> > I'm trying to implement Latent Semantic Indexing using the mahout ssvd
> > tool, and I'm having trouble understanding how I can use the output of
> ssvd
> > Mahout to 'fold' new queries (documents) into the LSI space.
> Specifically,
> > I can't find a way to multiply a vector representing a query by the
> inverse
> > of the matrix of singular values - I can't find a way to solve for the
> > inverse of the diagonal matrix of singular values.
> >
> > I can generate the output matrices using ssvd, and compare document/term
> > vectors using cosine similarity, but I'm stumped when it comes to
> folding a
> > new document into the space.
> >
> > Any thoughts or guidance would be appreciated.
> >
> > Cheers,
> > Chris

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message