mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: tf-idf + svd + cosine similarity
Date Tue, 14 Jun 2011 23:23:04 GMT
thanks, Jake.

On Tue, Jun 14, 2011 at 4:09 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
> On Tue, Jun 14, 2011 at 3:35 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>> Normalization means that second norm of columns in the eigenvector
>> matrix (i.e. all columns) is 1. In classic SVD A=U*Sigma*V', even if
>> it is a thin one, U and V are orthonormal.  I might be wrong but i was
>> under impression that i saw some discussion saying Lanczos singular
>> vector matrix is not necessarily orthonormal (although columns do form
>> orthogonal basis). I might be wrong about it.
>>
>
> LanczosSolver normalizes the singular vectors (LanczosSolver.java, line
> 162),
> and yes, returns V, not U: if U is documents x latent factors (so gives the
> projection of each input document onto the reduced basis), and V is
> latent factors x terms (and has rows which gives each show which
> latent factors are made up of what terms).  Lanczos solver doesn't keep
> track
> of documents (partly for scalability: documents can be thought of as
> "training" your latent factor model), but they instead return the latent
> factor by term "model": V.
>
>  -jake
>

Mime
View raw message