mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Incremental training of Classifier
Date Mon, 28 Dec 2009 19:52:49 GMT
On Mon, Dec 28, 2009 at 11:24 AM, Robin Anil <robin.anil@gmail.com> wrote:

> Long answer, You can use your 600 docs to test the classifier and see your
> accuracy. Then retrain with the entire documents and then test a test data
> set. So daily you can choose to include or exclude the 600 documents that
> come and ensure that you keep your classifier at the top performance.
>  After
> some amount of documents, you dont get much benefit of retraining. Further
> training would only add over fitting errors.
>

The suggestion that the 600 new documents be used to monitor performance is
an excellent one.

It should be pretty easy to add the "train on incremental data" option to
K-means.

Also, the k-means algorithm definitely will reach a point of diminishing
returns, but it should be very resistant to over training.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message