On Wed, Jun 15, 2011 at 10:06 AM, Stefan Wienert <stefan@wienert.cc> wrote:
> Hmm. Seems I have plenty of negative results (nearly half of the
> similarity). I can add +0.3 then the greatest negative results are
> near 0. This is not optimal...
> I can project the results to [0..1].
>
Looking for *dissimilar* results seems odd. What are you trying to do?
What people normally do is look for clusters of similar documents, or
just the topN most similar documents to each document. In both of these
cases, you don't care about the documents whose similarity to anyone is
zero, or less than zero.
jake
> Any other suggestions or comments?
>
> Cheers
> Stefan
>
> 2011/6/15 Jake Mannix <jake.mannix@gmail.com>:
> > While your original vectors never had similarity less than zero, after
> > projection onto the SVD space, you may "project away" similarities
> > between two vectors, and they are now negatively correlated in this
> > space (think about projecting (1,0,1) and (0,1,1) onto the 1d vector
> > space spanned by (1,1,0)  they go from having similarity +1/sqrt(2)
> > to similarity 1).
> >
> > I always interpret all similarities <= 0 as "maximally dissimilar",
> > even if technically 1 is where this is exactly true.
> >
> > jake
> >
> > On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <stefan@wienert.cc>
> wrote:
> >
> >> Ignoring is no option... so I have to interpret these values.
> >> Can one say that documents with similarity = 1 are the less similar
> >> documents? I don't think this is right.
> >> Any other assumptions?
> >>
> >> 2011/6/15 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>:
> >> > One question that I think it has not been answered yet is that of the
> >> > negative simliarities. In literature you can find that similiarity=1
> >> means
> >> > that "documents talk about opposite topics", but I think this is a
> quite
> >> > abstract idea... I just ignore them, when I'm trying to find topk
> >> similar
> >> > documents these surely won't be useful. I read recently that this has
> to
> >> do
> >> > with the assumptions in SVD which is designed for normal distributions
> >> (This
> >> > implies the posibility of negative values). There are other techniques
> >> > (Nonnegative factorization) that tries to solve this. I don't know if
> >> > there's something in mahout about this.
> >> >
> >> > Best,
> >> >
> >> > Fernando.
> >> >
> >> > 2011/6/15 Ted Dunning <ted.dunning@gmail.com>
> >> >
> >> >> The normal terminology is to name U and V in SVD as "singular
> vectors"
> >> as
> >> >> opposed to eigenvectors. The term eigenvectors is normally reserved
> for
> >> >> the
> >> >> symmetric case of U S U' (more generally, the Hermitian case, but
we
> >> only
> >> >> support real values).
> >> >>
> >> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <
> dlieu.7@gmail.com
> >> >> >wrote:
> >> >>
> >> >> > I beg to differ... U and V are left and right eigenvectors, and
> >> >> > singular values is denoted as Sigma (which is a square root of
> eigen
> >> >> > values of the AA' as you correctly pointed out) .
> >> >> >
> >> >>
> >> >
> >>
> >>
> >>
> >> 
> >> Stefan Wienert
> >>
> >> http://www.wienert.cc
> >> stefan@wienert.cc
> >>
> >> Telefon: +4952512026838
> >> Mobil: +4917640170270
> >>
> >
>
>
>
> 
> Stefan Wienert
>
> http://www.wienert.cc
> stefan@wienert.cc
>
> Telefon: +4952512026838
> Mobil: +4917640170270
>
