mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: SVM Implementation for mahout?
Date Sun, 08 Dec 2013 17:24:35 GMT

The problem of correlation of features is clearly present in text, but it is not so clear
what the effect will be. For naive bayes this has the effect of making the classifier over
confident but it usually still works reasonably well.  For logistic regression without regularization
it can cause the learning algorithm to fail (mahout'so logistic regression is regularized,
btw). 

Empirical evidence dominates theory in this situation. 

Sent from my iPhone

> On Dec 8, 2013, at 9:14, Fernando Santos <fernandoleandro1991@gmail.com> wrote:
> 
> Now just a theoretical doubt. In a text classification example, what would
> it mean to have features that are high correlated?  I mean, in this case
> our features are basically words, do you have an example of how these
> features can not be independant? This concept is not really clear in my
> mind...

Mime
View raw message