mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mani Kumar <>
Subject Incremental training of Classifier
Date Mon, 28 Dec 2009 19:16:49 GMT
Hi All,

I have ran 20newsgroups example. Got a very good idea of how cluster is
working for a defined dataset.

But i have a slightly different situation here.

* I have few thousands of documents (50k).
* Everyday i get some e.g. 1k documents and out of which 600 are already
classified so i need to classify only 400 documents everyday.

So my approach would be:

1. Get all the documents into hdfs
2. Train classifier based on data in hdfs
3. Classify new unclassified document.

Right now i don't see a way to add more training documents (600 already
classified docs) into system? Am i missing something?

Also I don't want to remove and then create training model again.

Mani Kumar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message