mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Clustering single doc as multiple docs
Date Fri, 30 Apr 2010 17:24:02 GMT

On Apr 30, 2010, at 1:15 PM, Robin Anil wrote:

> On Fri, Apr 30, 2010 at 10:40 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com>wrote:
> 
>> Hi Grant,
>> 
>> You are probably right.
>> What I wanted is to use my mahout setup to extract topics from a single
>> document.
>> So, maybe in popular terms I am trying to do topic extraction via document
>> clustering.
>> Does it make sense to try to split a doc into sub docs so that I leverage
>> the clustering algorithm and thus find topic which appear key ones for the
>> document?
>> 
> Have you heard of LDA (Its in Mahout). Or are you trying to do something
> different for topic extraction ?

That's more across docs.  You might also have a look at TextRank, which is a graph based approach
to keyword/topic extraction that is nice to implement (one of these days, I'll do it in Mahout)
Mime
View raw message