What is the reasoning behind PearsonCorrelationSimilarity returning
NaN for userSimilarity when the two user's overlapping reviews match
up perfectly?
In my case of a limited set of rating values (1 to 5 stars) it seems
quite possible that a user with a smaller number of ratings might have
overlapping ratings with other users. Am I missing something here.
// Note that sum of X and sum of Y don't appear here since they are
assumed to be 0;
// the data is assumed to be centered.
double denominator = Math.sqrt(sumX2) * Math.sqrt(sumY2);
if (denominator == 0.0) {
// One or both parties has -all- the same ratings;
// can't really say much similarity under this measure
return Double.NaN;
}
return sumXY / denominator;
|