To answer a few recent points:
Not sure if this is helpful, but, the collaborative filtering part of
Mahout contains an implementation of cosine distance measure  sort
of. Really it has an implementation of the Pearson correlation, which
is equivalent, if the data are 'centered' (have a mean of 0). This is,
in my opinion, a good idea. So if you agree, you could copy and adapt
this implementation of Pearson to your purpose. It is pretty easy to
recreate the actual cosine distance measure correlation too from this
code  I used to have it separately in the code.
The Tanimoto distance is a ratio of intersection to union of two sets,
so is between 0 and 1. Cosine distance is, essentially, the cosine of
an angle in featurespace, so is between 1 and 1.
On Sat, Dec 6, 2008 at 12:54 PM, Philippe Lamarche
<philippe.lamarche@gmail.com> wrote:
> Hi,
>
> I used the Tanimoto distance. As I understand it, it's almost like the
> cosine distance, with a range between 0 and infinity as opposed to 0 and
> 3.14. Seems to work well.
>
>
>
>
> On Fri, Dec 5, 2008 at 11:54 PM, dipesh <dipshrestha@gmail.com> wrote:
>
>> Hi Philippe,
>>
>> I'm also doing some work on text clustering with feature extraction. For
>> text clustering the Cosine Distance is considered a better Similarity
>> metrics than the Eucledian Distance Measure. I couldn't find
>> CosineDistanceMeasure in Mahout, did u use Cosine Distance Measure in your
>> clustering project?
