mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: LSI using Mahout ssvd - folding a new doc into the space
Date Fri, 29 Jun 2012 22:42:16 GMT
Yes the two are saying the same thing in different ways.

What you really need, to project a new column of A into V, is the
(pseudo-)inverse (U * sigma)^-1. This would be sigma^-1 * U^-1. Here
U^-1 = UT because the SVD gives you orthonormal bases in U and V --
that's a nice property of what the SVD computes because it pulls out
the scaling factors into sigma (and this is why the SVD takes more
work than other decompositions). So that gives you your formula and
yes you apply it on the left.

Same works for the new rows of A projecting into U.

I think my explanation and that paper play a little fast and loose
with what's a column and what's a row here but it's just a matter of a
transposition to get what you need.

On Fri, Jun 29, 2012 at 11:13 PM, Chris Hokamp <chris.hokamp@gmail.com> wrote:
> Thanks for the quick response. So I will create a new diagonal matrix with
> the reciprocals of the eigenvalues, and multiply by that. I took a look at
> the slides (very nice presentation!), but it seems that I won't even need
> to go this far, as I should be able to take E^(-1) x U^(T) x docvector, and
> U is available from the output of ssvd. I'm basing this assumption on pages
> 2/3 of [1].
>
> Thanks again for the help,
> Chris
>
> [1]
> https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.data/SSVD-CLI.pdf
>
> On Fri, Jun 29, 2012 at 4:31 PM, Sean Owen <srowen@gmail.com> wrote:
>
>> Well the inverse of a diagonal matrix like that is just going to be a
>> diagonal matrix holding the reciprocals (1/x) of the values. That much
>> is easy. But you need to invert more than that to fold in.
>>
>> I admit even I don't know the details of the Mahout implementation
>> you're using, but I imagine the overall principle is the same as the
>> fold-in described in ... oh wait, look at that, in a preso I posted a
>> while ago: http://www.slideshare.net/srowen/matrix-factorization  Look
>> at the last few slides; I think it's kind of a useful / simple way to
>> think of it.
>>
>> Sean
>>
>> On Fri, Jun 29, 2012 at 10:27 PM, Chris Hokamp <chris.hokamp@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'm trying to implement Latent Semantic Indexing using the mahout ssvd
>> > tool, and I'm having trouble understanding how I can use the output of
>> ssvd
>> > Mahout to 'fold' new queries (documents) into the LSI space.
>> Specifically,
>> > I can't find a way to multiply a vector representing a query by the
>> inverse
>> > of the matrix of singular values - I can't find a way to solve for the
>> > inverse of the diagonal matrix of singular values.
>> >
>> > I can generate the output matrices using ssvd, and compare document/term
>> > vectors using cosine similarity, but I'm stumped when it comes to
>> folding a
>> > new document into the space.
>> >
>> > Any thoughts or guidance would be appreciated.
>> >
>> > Cheers,
>> > Chris
>>

Mime
View raw message