mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucas Fernandes Brunialti <lbrunia...@igcorp.com.br>
Subject Re: SVM Implementation for mahout?
Date Sun, 08 Dec 2013 18:13:21 GMT
Hi,

Fernando, to get a better understanding of correlation, you could think of
features as events in probability, then if the probability of the
intersection is high, the events are high correlated...

I agree with Ted. But usually, naive bayes  works well with text
classification when you have a good pre-processing phase, using pca, tf-idf
or lda... Are you doing any pre-processing?
On Dec 8, 2013 3:25 PM, "Ted Dunning" <ted.dunning@gmail.com> wrote:

>
> The problem of correlation of features is clearly present in text, but it
> is not so clear what the effect will be. For naive bayes this has the
> effect of making the classifier over confident but it usually still works
> reasonably well.  For logistic regression without regularization it can
> cause the learning algorithm to fail (mahout'so logistic regression is
> regularized, btw).
>
> Empirical evidence dominates theory in this situation.
>
> Sent from my iPhone
>
> > On Dec 8, 2013, at 9:14, Fernando Santos <fernandoleandro1991@gmail.com>
> wrote:
> >
> > Now just a theoretical doubt. In a text classification example, what
> would
> > it mean to have features that are high correlated?  I mean, in this case
> > our features are basically words, do you have an example of how these
> > features can not be independant? This concept is not really clear in my
> > mind...
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message