mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Vatkov <bogdan.vat...@gmail.com>
Subject Re: Clustering single doc as multiple docs
Date Fri, 30 Apr 2010 17:32:23 GMT
I will check it but I am not sure I will have the right knowledge to
implement it, is there a ready to be used impl somewhere?
Btw, why do you think splitting and clustering won't work? Have anybody
tried this?
I am not sure it will be successful but I also do not have the arguments
that it should not lead to a meaningful result.
If I split a doc per sentence it might not get good results but if I use
larger pieces, e.g. paragraphs it might give some topics (sets of keywords).
Anyone tried something like this?

On Fri, Apr 30, 2010 at 8:24 PM, Grant Ingersoll <gsingers@apache.org>wrote:

>
> On Apr 30, 2010, at 1:15 PM, Robin Anil wrote:
>
> > On Fri, Apr 30, 2010 at 10:40 PM, Bogdan Vatkov <bogdan.vatkov@gmail.com
> >wrote:
> >
> >> Hi Grant,
> >>
> >> You are probably right.
> >> What I wanted is to use my mahout setup to extract topics from a single
> >> document.
> >> So, maybe in popular terms I am trying to do topic extraction via
> document
> >> clustering.
> >> Does it make sense to try to split a doc into sub docs so that I
> leverage
> >> the clustering algorithm and thus find topic which appear key ones for
> the
> >> document?
> >>
> > Have you heard of LDA (Its in Mahout). Or are you trying to do something
> > different for topic extraction ?
>
> That's more across docs.  You might also have a look at TextRank, which is
> a graph based approach to keyword/topic extraction that is nice to implement
> (one of these days, I'll do it in Mahout)




-- 
Best regards,
Bogdan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message