spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <fli...@databricks.com>
Subject Re: miniBatchFraction for LinearRegressionWithSGD
Date Fri, 07 Aug 2015 18:24:55 GMT
Yep, I think that's what Gerald is saying and they are proposing to default
miniBatchFraction = (1 / numInstances). Is that correct?

On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetry14@gmail.com>
wrote:

> I think in the SGD algorithm, the mini batch sample is done without
> replacement. So with fraction=1, then all the rows will be sampled
> exactly once to form the miniBatch, resulting to the
> deterministic/classical case.
>
> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fliang@databricks.com>
> wrote:
> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're up
> for
> > it) so we can see what others think!
> >
> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler
> > <gerald.loeffler@googlemail.com> wrote:
> >>
> >> hi,
> >>
> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0,
> >> doesn’t that make it a deterministic/classical gradient descent rather
> >> than a SGD?
> >>
> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e.
> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction
> >> that results in exactly one row of the data set?
> >>
> >> thank you
> >> gerald
> >>
> >> --
> >> Gerald Loeffler
> >> mailto:gerald.loeffler@googlemail.com
> >> http://www.gerald-loeffler.net
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: user-help@spark.apache.org
> >>
> >
>

Mime
View raw message