Pat,
Here is an example from the output of the rowsimilarity job for a corpus I am working with
(using Cosine Similarity).
Key: 25: Value: {27433:0.9999999999999994}
What this means is that Document# 26 is similar to Document# 27433by a factor of 0.999.
Since Distance = (1  Similarity), this means that the distance between documents 25 and 27433
above is 0 (= 1  0.999), or in other words they are very similar.
Hope that clarifies.
Suneel
What is the value created to describe simlarity by RowSimilarityJob? The paper which describes
how the algorithm is implemented doesn't describe the various similarity values returned by
mahout. It seems to focus on cooccurrences.
For SIMILARITY_COSINE is the value = cosine or 1  cosine?
Is the value calculated after cooccurrences determines similar docs independently?
The code is very difficult to read so a little help would be appreciated.
