mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Wienert <ste...@wienert.cc>
Subject Re: tf-idf + svd + cosine similarity
Date Wed, 15 Jun 2011 09:10:44 GMT
Ignoring is no option... so I have to interpret these values.
Can one say that documents with similarity = -1 are the less similar
documents? I don't think this is right.
Any other assumptions?

2011/6/15 Fernando Fernández <fernando.fernandez.gonzalez@gmail.com>:
> One question that I think it has not been answered yet is that of the
> negative simliarities. In literature you can find that similiarity=-1 means
> that "documents talk about opposite topics", but I think this is a quite
> abstract idea... I just ignore them, when I'm trying to find top-k similar
> documents these surely won't be useful. I read recently that this has to do
> with the assumptions in SVD which is designed for normal distributions (This
> implies the posibility of negative values). There are other techniques
> (Non-negative factorization) that tries to solve this. I don't know if
> there's something in mahout about this.
>
> Best,
>
> Fernando.
>
> 2011/6/15 Ted Dunning <ted.dunning@gmail.com>
>
>> The normal terminology is to name U and V in SVD as "singular vectors" as
>> opposed to eigenvectors.  The term eigenvectors is normally reserved for
>> the
>> symmetric case of U S U'  (more generally, the Hermitian case, but we only
>> support real values).
>>
>> On Wed, Jun 15, 2011 at 12:35 AM, Dmitriy Lyubimov <dlieu.7@gmail.com
>> >wrote:
>>
>> > I beg to differ... U and V are left and right eigenvectors, and
>> > singular values is denoted as Sigma (which is a square root of eigen
>> > values of the AA' as you correctly pointed out) .
>> >
>>
>



-- 
Stefan Wienert

http://www.wienert.cc
stefan@wienert.cc

Telefon: +495251-2026838
Mobil: +49176-40170270

Mime
View raw message