mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Clustering single doc as multiple docs
Date Fri, 30 Apr 2010 15:18:13 GMT
This strike me a little bit as an XY problem: http://people.apache.org/~hossman/#xyproblem

Perhaps it would be helpful if you could back up a little and describe the higher level problem
you are trying to solve.  You certainly can split up your documents and then cluster them,
but I'm not sure that is actually going to give you what you need.

Cheers,
Grant

On Apr 30, 2010, at 5:29 AM, Bogdan Vatkov wrote:

> Hi,
> 
> I would like to run some clustering for a single document but then I want
> that multiple clusters are extracted.
> I guess I have to find a way to split the doc into multiple docs / input
> vectors but I am wondering if there are any best practices on how to do the
> split then
> Should I derive vectors based on sentences or paragraphs? Is there a
> paragraph boundary detection tool around?
> Any recommendations will be appreciated.
> 
> Best regards,
> Bogdan



Mime
View raw message