mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: confidence values of one (or more) feature(s)
Date Thu, 03 Nov 2011 15:06:46 GMT
There are no confidence values per se in the models computed by Mahout at
this time.

There are several issues here,

1) Naive Bayes doesn't have such a concept.  'Nuff said there.

2) SGD logistic regresssion could compute confidence intervals, but I am
not quite sure how to do that with stochastic gradient descent.

3) in most uses of Mahout's logistic regression, the issues are data size
and feature set size.  Confidence values are typically used for selecting
features which is typically not a viable strategy for problems with very
large feature sets.  That is what the L1 regularization is all about.

4) with an extremely large number features, the noise on confidence
intervals makes them very hard to understand

5) with hashed features and feature collisions it is hard enough to
understand which feature is doing what, much less what the confidence
interval means.

Can you say more about your problem?  Is it small enough to use bayesglm in
R?

On Thu, Nov 3, 2011 at 7:25 AM, David Rahman <drahman1985@googlemail.com>wrote:

> Me again,
>
> can someone point me to right direction? How can I access these features?
> I looked into the summary(int n) -method located in
> org.apache.mahout.classifier.sgd.Modeldissector.java, but somehow I don't
> understand how it works.
>
> Could someone explain to me how it works? As I understand it, it returns
> just the max-value of a feature.
>
> Thanks and regards,
> David
>
> 2011/10/20 David Rahman <drahman1985@googlemail.com>
>
> > Hi,
> >
> > how can I access the confidence values of one (or more) feature(s) with
> > its possibilities?
> >
> > In the 20Newsgroup-example, there is the dissect method, within there is
> > used summary(int n), which returns the n most important features with
> their
> > weights. I want also the features which are placed second or third (or
> > more). How can I access those?
> >
> > Regards,
> > David
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message