mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Santoshi <vishal.santo...@gmail.com>
Subject Re: Detecting high bias and variance in AdaptiveLogisticRegression classification
Date Wed, 27 Nov 2013 15:07:18 GMT
Hell Ted,

Are we to assume that SGD is still a work in progress and implementations (
Cross Fold, Online, Adaptive ) are too flawed to be realistically used ?
The evolutionary algorithm seems to be the core of OnlineLogisticRegression,
which in turn builds up to Adaptive/Cross Fold.

>>b) for truly on-line learning where no repeated passes through the data..

What would it take to get to an implementation ? How can any one help ?

Regards,





On Wed, Nov 27, 2013 at 2:26 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Well, first off, let me say that I am much less of a fan now of the magical
> cross validation approach and adaptation based on that than I was when I
> wrote the ALR code.  There are definitely legs in the ideas, but my
> implementation has a number of flaws.
>
> For example:
>
> a) the way that I provide for handling multiple passes through the data is
> very easy to screw up.  I think that simply separating the data entirely
> might be a better approach.
>
> b) for truly on-line learning where no repeated passes through the data
> will ever occur, then cross validation is not the best choice.  Much better
> in those cases to use what Google researchers described in [1].
>
> c) it is clear from several reports that the evolutionary algorithm
> prematurely shuts down the learning rate.  I think that Adagrad-like
> learning rates are more reliable.  See [1] again for one of the more
> readable descriptions of this.  See also [2] for another view on adaptive
> learning rates.
>
> d) item (c) is also related to the way that learning rates are adapted in
> the underlying OnlineLogisticRegression.  That needs to be fixed.
>
> e) asynchronous parallel stochastic gradient descent with mini-batch
> learning is where we should be headed.  I do not have time to write it,
> however.
>
> All this aside, I am happy to help in any way that I can given my recent
> time limits.
>
>
> [1] http://research.google.com/pubs/pub41159.html
>
> [2] http://www.cs.jhu.edu/~mdredze/publications/cw_nips_08.pdf
>
>
>
> On Tue, Nov 26, 2013 at 12:54 PM, optimusfan <optimusfan@yahoo.com> wrote:
>
> > Hi-
> >
> > We're currently working on a binary classifier using
> > Mahout's AdaptiveLogisticRegression class.  We're trying to determine
> > whether or not the models are suffering from high bias or variance and
> were
> > wondering how to do this using Mahout's APIs?  I can easily calculate the
> > cross validation error and I think I could detect high bias or variance
> if
> > I could compare that number to my training error, but I'm not sure how to
> > do this.  Or, any other ideas would be appreciated!
> >
> > Thanks,
> > Ian
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message