mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mattias Hilliges <hilli...@neofonie.de>
Subject Problems with AbstractSimilarity
Date Mon, 26 Apr 2010 13:02:03 GMT
Hi,
i detected the following behaviour, that seems a bit strange to me:
Be v=(v1, v2,...,vn) and w=(w1, w2, ...,wm) vectors, that are used to
compute the similarity between two items/users. If all vi, that overlap
with w (this means vi!=0 and wi!=0), are equal, and if all wj, that
overlap with v, are equal, no euclidean or pearson similarity can be
computed.

The attached test considers the following vectors: v=(0,2; 0,2; 0,4) and
w=(0,7; 0,7; 0). The overlapping vector components of v are all 0,2. The
overlapping components of w are all 0,7.

The problem is, that "double computeResult(int n, double sumXY, double
sumX2, double sumY2, double sumXYdiff2)" in the corresponding subclass
of AbstractSimilarity is called with parameters sumXY=sumX2=sumY2=0 and
therefore returns Double.NaN. This behaviour contradicts the behaviour
described in the book "Mahout in Action", p.49. The last complete
sentence here is: "Note that we were able compute some notion of
similarity for all pairs of users here, whereas the Pearson correlation
couldn't produce an answer for users 1 and 3." Because of the described
problem, the euclidean algorithm can't produce an answer either. This is
a special case of the described problem, where there is only one overlap.

Regards,
Mattias

-- 
--------------------------------
Mattias Hilliges
Softwareentwicklung
Forschung und Entwicklung

neofonie
Technologieentwicklung und
Informationsmanagement GmbH
Robert-Koch-Platz 4
10115 Berlin
fon: +49.30 24627 100
fax: +49.30 24627 120
mattias.hilliges@neofonie.de
http://www.neofonie.de 

Handelsregister
Berlin-Charlottenburg: HRB 67460

Geschaeftsfuehrung
Helmut Hoffer von Ankershoffen
(Sprecher der Geschaeftsfuehrung)
Nurhan Yildirim
--------------------------------


Mime
View raw message