mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Incremental training of Classifier
Date Mon, 28 Dec 2009 19:52:49 GMT
On Mon, Dec 28, 2009 at 11:24 AM, Robin Anil <> wrote:

> Long answer, You can use your 600 docs to test the classifier and see your
> accuracy. Then retrain with the entire documents and then test a test data
> set. So daily you can choose to include or exclude the 600 documents that
> come and ensure that you keep your classifier at the top performance.
>  After
> some amount of documents, you dont get much benefit of retraining. Further
> training would only add over fitting errors.

The suggestion that the 600 new documents be used to monitor performance is
an excellent one.

It should be pretty easy to add the "train on incremental data" option to

Also, the k-means algorithm definitely will reach a point of diminishing
returns, but it should be very resistant to over training.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message