mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Persson Lindqvist" <>
Subject text clustering noob
Date Wed, 04 Jun 2008 08:30:10 GMT
Hi list!

I've been looking at mahout since the start and am very excited. However,
I'm a ML-noob and need some introductory pointers before I can start play.

What I want to do fairly simple: I have small set of text snippets which I
now match a smaller set of articles, so that an article consists of one or
more of the text snippets. So I need to group those snippets into articles.
Preferably would I like to be able to detect "noise" as well (snippet has
too little or dirty information and is not classified as an article.)

I have access to large training sets of "complete" articles.

Now, anyone got any tip on how to achieve this? Which of the algos discussed
here would be sufficient?

Any help much appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message