mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Detecting high bias and variance in AdaptiveLogisticRegression classification
Date Wed, 27 Nov 2013 07:26:11 GMT
Well, first off, let me say that I am much less of a fan now of the magical
cross validation approach and adaptation based on that than I was when I
wrote the ALR code.  There are definitely legs in the ideas, but my
implementation has a number of flaws.

For example:

a) the way that I provide for handling multiple passes through the data is
very easy to screw up.  I think that simply separating the data entirely
might be a better approach.

b) for truly on-line learning where no repeated passes through the data
will ever occur, then cross validation is not the best choice.  Much better
in those cases to use what Google researchers described in [1].

c) it is clear from several reports that the evolutionary algorithm
prematurely shuts down the learning rate.  I think that Adagrad-like
learning rates are more reliable.  See [1] again for one of the more
readable descriptions of this.  See also [2] for another view on adaptive
learning rates.

d) item (c) is also related to the way that learning rates are adapted in
the underlying OnlineLogisticRegression.  That needs to be fixed.

e) asynchronous parallel stochastic gradient descent with mini-batch
learning is where we should be headed.  I do not have time to write it,
however.

All this aside, I am happy to help in any way that I can given my recent
time limits.


[1] http://research.google.com/pubs/pub41159.html

[2] http://www.cs.jhu.edu/~mdredze/publications/cw_nips_08.pdf



On Tue, Nov 26, 2013 at 12:54 PM, optimusfan <optimusfan@yahoo.com> wrote:

> Hi-
>
> We're currently working on a binary classifier using
> Mahout's AdaptiveLogisticRegression class.  We're trying to determine
> whether or not the models are suffering from high bias or variance and were
> wondering how to do this using Mahout's APIs?  I can easily calculate the
> cross validation error and I think I could detect high bias or variance if
> I could compare that number to my training error, but I'm not sure how to
> do this.  Or, any other ideas would be appreciated!
>
> Thanks,
> Ian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message