mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bogdan Vatkov <>
Subject Clustering single doc as multiple docs
Date Fri, 30 Apr 2010 09:29:42 GMT

I would like to run some clustering for a single document but then I want
that multiple clusters are extracted.
I guess I have to find a way to split the doc into multiple docs / input
vectors but I am wondering if there are any best practices on how to do the
split then
Should I derive vectors based on sentences or paragraphs? Is there a
paragraph boundary detection tool around?
Any recommendations will be appreciated.

Best regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message