mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <>
Subject Re: How to define a topic for cluster.
Date Wed, 25 Aug 2010 17:30:44 GMT

On Aug 25, 2010, at 9:32am, Young wrote:

> I am using the mahout to cluster the news and I could see the top  
> words for each cluster. But I am very keen to know how to define a  
> topic for each cluster? Do we have to hardcore the topic for the  
> cluster?
> I find an interesting site  
> and they make excellent topics clustering based on the page content.

You can use Carrot2 to generate labels for clusters, but in my  
experience it has issues with the size of individual elements in the  
dataset is large. Carrot2 is optimized for clustering/labeling search  
results, and seems to key off the phrases found in titles of web pages  
and search summaries.

Next I was going to try to derive SIPs (statistically improbable  
phrases) from documents in the cluster, but we ran out of time on that  

-- Ken

Ken Krugler
+1 530-210-6378
e l a s t i c   w e b   m i n i n g

View raw message