mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Podolski <>
Subject Re: Word and Phrase Clustering
Date Fri, 02 Dec 2011 03:16:33 GMT
Did you have a look at 'Taming Text' (by Grant S. Ingersoll, Thomas S. Morton, and Andrew L.
Farris)?  There are some sections in this that might be relevant for your issue.


 From: Neil Chaudhuri <>
To: "" <> 
Sent: Friday, 2 December 2011, 3:08
Subject: Word and Phrase Clustering
I have a need to cluster a collection of words and phrases by syntactic similarity over a
distributed environment, and I came upon Mahout as a possible solution. After studying the
documentation though, I am finding all of it tailored to working with entire documents rather
than words and phrases. I simply want to know if you believe that Mahout is the right tool
for this job. I suppose I could try to view each word and phrase as individual tiny documents,
but that feels like I am forcing it.

Any insight is appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message