Hmm. Seems I have plenty of negative results (nearly half of the
similarity). I can add +0.3 then the greatest negative results are
near 0. This is not optimal...
I can project the results to [0..1].
Any other suggestions or comments?
Cheers
Stefan
2011/6/15 Jake Mannix <jake.mannix@gmail.com>:
> While your original vectors never had similarity less than zero, after
> projection onto the SVD space, you may "project away" similarities
> between two vectors, and they are now negatively correlated in this
> space (think about projecting (1,0,1) and (0,1,1) onto the 1d vector
> space spanned by (1,1,0)  they go from having similarity +1/sqrt(2)
> to similarity 1).
>
> I always interpret all similarities <= 0 as "maximally dissimilar",
> even if technically 1 is where this is exactly true.
>
> jake
>
> On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <stefan@wienert.cc> wrote:
>
>> Ignoring is no option... so I have to interpret these values.
>> Can one say that documents with similarity = 1 are the less similar
>> documents? I don't think this is right.
>> Any other assumptions?
>>
>> 2011/6/15 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>:
>> > One question that I think it has not been answered yet is that of the
>> > negative simliarities. In literature you can find that similiarity=1
>> means
>> > that "documents talk about opposite topics", but I think this is a quite
>> > abstract idea... I just ignore them, when I'm trying to find topk
>> similar
>> > documents these surely won't be useful. I read recently that this has to
>> do
>> > with the assumptions in SVD which is designed for normal distributions
>> (This
>> > implies the posibility of negative values). There are other techniques
>> > (Nonnegative factorization) that tries to solve this. I don't know if
>> > there's something in mahout about this.
>> >
>> > Best,
>> >
>> > Fernando.
>> >
>> > 2011/6/15 Ted Dunning <ted.dunning@gmail.com>
>> >
>> >> The normal terminology is to name U and V in SVD as "singular vectors"
>> as
>> >> opposed to eigenvectors. The term eigenvectors is normally reserved for
>> >> the
>> >> symmetric case of U S U' (more generally, the Hermitian case, but we
>> only
>> >> support real values).
>> >>
>> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> >> >wrote:
>> >>
>> >> > I beg to differ... U and V are left and right eigenvectors, and
>> >> > singular values is denoted as Sigma (which is a square root of eigen
>> >> > values of the AA' as you correctly pointed out) .
>> >> >
>> >>
>> >
>>
>>
>>
>> 
>> Stefan Wienert
>>
>> http://www.wienert.cc
>> stefan@wienert.cc
>>
>> Telefon: +4952512026838
>> Mobil: +4917640170270
>>
>

Stefan Wienert
http://www.wienert.cc
stefan@wienert.cc
Telefon: +4952512026838
Mobil: +4917640170270
