mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Fernández <fernando.fernandez.gonza...@gmail.com>
Subject Re: tf-idf + svd + cosine similarity
Date Wed, 15 Jun 2011 08:44:34 GMT
One question that I think it has not been answered yet is that of the
negative simliarities. In literature you can find that similiarity=-1 means
that "documents talk about opposite topics", but I think this is a quite
abstract idea... I just ignore them, when I'm trying to find top-k similar
documents these surely won't be useful. I read recently that this has to do
with the assumptions in SVD which is designed for normal distributions (This
implies the posibility of negative values). There are other techniques
(Non-negative factorization) that tries to solve this. I don't know if
there's something in mahout about this.

Best,

Fernando.

2011/6/15 Ted Dunning <ted.dunning@gmail.com>

> The normal terminology is to name U and V in SVD as "singular vectors" as
> opposed to eigenvectors.  The term eigenvectors is normally reserved for
> the
> symmetric case of U S U'  (more generally, the Hermitian case, but we only
> support real values).
>
> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
> >wrote:
>
> > I beg to differ... U and V are left and right eigenvectors, and
> > singular values is denoted as Sigma (which is a square root of eigen
> > values of the AA' as you correctly pointed out) .
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message