mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Re: tf-idf + svd + cosine similarity
Date Wed, 15 Jun 2011 17:06:06 GMT
Hmm. Seems I have plenty of negative results (nearly half of the
similarity). I can add +0.3 then the greatest negative results are
near 0. This is not optimal...
I can project the results to [0..1].
Any other suggestions or comments?

Cheers
Stefan

2011/6/15 Jake Mannix <jake.mannix@gmail.com>:
> While your original vectors never had similarity less than zero, after
> projection onto the SVD space, you may "project away" similarities
> between two vectors, and they are now negatively correlated in this
> space (think about projecting (1,0,1) and (0,1,1) onto the 1-d vector
> space spanned by (1,-1,0) - they go from having similarity +1/sqrt(2)
> to similarity -1).
>
> I always interpret all similarities <= 0 as "maximally dissimilar",
> even if technically -1 is where this is exactly true.
>
>  -jake
>
> On Wed, Jun 15, 2011 at 2:10 AM, Stefan Wienert <stefan@wienert.cc> wrote:
>
>> Ignoring is no option... so I have to interpret these values.
>> Can one say that documents with similarity = -1 are the less similar
>> documents? I don't think this is right.
>> Any other assumptions?
>>
>> 2011/6/15 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>:
>> > One question that I think it has not been answered yet is that of the
>> > negative simliarities. In literature you can find that similiarity=-1
>> means
>> > that "documents talk about opposite topics", but I think this is a quite
>> > abstract idea... I just ignore them, when I'm trying to find top-k
>> similar
>> > documents these surely won't be useful. I read recently that this has to
>> do
>> > with the assumptions in SVD which is designed for normal distributions
>> (This
>> > implies the posibility of negative values). There are other techniques
>> > (Non-negative factorization) that tries to solve this. I don't know if
>> > there's something in mahout about this.
>> >
>> > Best,
>> >
>> > Fernando.
>> >
>> > 2011/6/15 Ted Dunning <ted.dunning@gmail.com>
>> >
>> >> The normal terminology is to name U and V in SVD as "singular vectors"
>> as
>> >> opposed to eigenvectors.  The term eigenvectors is normally reserved for
>> >> the
>> >> symmetric case of U S U'  (more generally, the Hermitian case, but we
>> only
>> >> support real values).
>> >>
>> >> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> >> >wrote:
>> >>
>> >> > I beg to differ... U and V are left and right eigenvectors, and
>> >> > singular values is denoted as Sigma (which is a square root of eigen
>> >> > values of the AA' as you correctly pointed out) .
>> >> >
>> >>
>> >
>>
>>
>>
>> --
>> Stefan Wienert
>>
>> http://www.wienert.cc
>> stefan@wienert.cc
>>
>> Telefon: +495251-2026838
>> Mobil: +49176-40170270
>>
>



-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838
Mobil: +49176-40170270

Mime
View raw message