mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: LSI using Mahout ssvd - folding a new doc into the space
Date Fri, 29 Jun 2012 22:39:50 GMT
Yes. the fold-in formula is given in the link you mentioned , formulas
(2) and (3), of which you probably need only one depending from which
way you are going. Usually you are folding in new documents (rows of
U), so you need formula (2) to add new folded-in rows.

Also as comment implies, your new observation vector for document is
very sparse (as document is unlikely to have all tokens you observed
in the corpus), so actual computation of (2) may be optimized quite a
bit if V is indexed row-wise and specific rows of V (which is
essentially dictionary vectors) can be yanked out very quickly.

-d

On Fri, Jun 29, 2012 at 3:13 PM, Chris Hokamp <chris.hokamp@gmail.com> wrote:
> Thanks for the quick response. So I will create a new diagonal matrix with
> the reciprocals of the eigenvalues, and multiply by that. I took a look at
> the slides (very nice presentation!), but it seems that I won't even need
> to go this far, as I should be able to take E^(-1) x U^(T) x docvector, and
> U is available from the output of ssvd. I'm basing this assumption on pages
> 2/3 of [1].
>
> Thanks again for the help,
> Chris
>
> [1]
> https://cwiki.apache.org/MAHOUT/stochastic-singular-value-decomposition.data/SSVD-CLI.pdf
>
> On Fri, Jun 29, 2012 at 4:31 PM, Sean Owen <srowen@gmail.com> wrote:
>
>> Well the inverse of a diagonal matrix like that is just going to be a
>> diagonal matrix holding the reciprocals (1/x) of the values. That much
>> is easy. But you need to invert more than that to fold in.
>>
>> I admit even I don't know the details of the Mahout implementation
>> you're using, but I imagine the overall principle is the same as the
>> fold-in described in ... oh wait, look at that, in a preso I posted a
>> while ago: http://www.slideshare.net/srowen/matrix-factorization  Look
>> at the last few slides; I think it's kind of a useful / simple way to
>> think of it.
>>
>> Sean
>>
>> On Fri, Jun 29, 2012 at 10:27 PM, Chris Hokamp <chris.hokamp@gmail.com>
>> wrote:
>> > Hi all,
>> >
>> > I'm trying to implement Latent Semantic Indexing using the mahout ssvd
>> > tool, and I'm having trouble understanding how I can use the output of
>> ssvd
>> > Mahout to 'fold' new queries (documents) into the LSI space.
>> Specifically,
>> > I can't find a way to multiply a vector representing a query by the
>> inverse
>> > of the matrix of singular values - I can't find a way to solve for the
>> > inverse of the diagonal matrix of singular values.
>> >
>> > I can generate the output matrices using ssvd, and compare document/term
>> > vectors using cosine similarity, but I'm stumped when it comes to
>> folding a
>> > new document into the space.
>> >
>> > Any thoughts or guidance would be appreciated.
>> >
>> > Cheers,
>> > Chris
>>

Mime
View raw message