mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Loic Descotte <>
Subject SGD/SVM classification : minimum dataset size for training
Date Mon, 12 Sep 2011 13:52:05 GMT
Hi all,

My classification problem is very similar to the "20 newsgroups" 
example.  But I don't have the possibility to use a large quantity of 
data for training.

I'd like to know what would be the "minimum" size of training data for 
SGD or SVM algorithms to have reasonable results.

My datas are the same kind of the 20 newsgroups example but they have 
fewer lines.
The body of each entry is about beetween 40 and 60 words.

I'd like to try with 10 examples by category (with 2 or 3 category), 
choosing good examples with the more frequent keywords to be sure that 
the learning phase will be efficient.

Can it be relevant with so little data ?



View raw message