mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: tf-idf + svd + cosine similarity
Date Wed, 15 Jun 2011 16:31:47 GMT
While your original vectors never had similarity less than zero, after
projection onto the SVD space, you may "project away" similarities
between two vectors, and they are now negatively correlated in this
space (think about projecting (1,0,1) and (0,1,1) onto the 1-d vector
space spanned by (1,-1,0) - they go from having similarity +1/sqrt(2)
to similarity -1).

I always interpret all similarities <= 0 as "maximally dissimilar",
even if technically -1 is where this is exactly true.

  -jake

On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <stefan@wienert.cc> wrote:

> Ignoring is no option... so I have to interpret these values.
> Can one say that documents with similarity = -1 are the less similar
> documents? I don't think this is right.
> Any other assumptions?
>
> 2011/6/15 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>:
> > One question that I think it has not been answered yet is that of the
> > negative simliarities. In literature you can find that similiarity=-1
> means
> > that "documents talk about opposite topics", but I think this is a quite
> > abstract idea... I just ignore them, when I'm trying to find top-k
> similar
> > documents these surely won't be useful. I read recently that this has to
> do
> > with the assumptions in SVD which is designed for normal distributions
> (This
> > implies the posibility of negative values). There are other techniques
> > (Non-negative factorization) that tries to solve this. I don't know if
> > there's something in mahout about this.
> >
> > Best,
> >
> > Fernando.
> >
> > 2011/6/15 Ted Dunning <ted.dunning@gmail.com>
> >
> >> The normal terminology is to name U and V in SVD as "singular vectors"
> as
> >> opposed to eigenvectors.  The term eigenvectors is normally reserved for
> >> the
> >> symmetric case of U S U'  (more generally, the Hermitian case, but we
> only
> >> support real values).
> >>
> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >> >wrote:
> >>
> >> > I beg to differ... U and V are left and right eigenvectors, and
> >> > singular values is denoted as Sigma (which is a square root of eigen
> >> > values of the AA' as you correctly pointed out) .
> >> >
> >>
> >
>
>
>
> --
> Stefan Wienert
>
> http://www.wienert.cc
> stefan@wienert.cc
>
> Telefon: +495251-2026838
> Mobil: +49176-40170270
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message