spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Feynman Liang <fli...@databricks.com>
Subject Re: miniBatchFraction for LinearRegressionWithSGD
Date Fri, 07 Aug 2015 20:34:40 GMT
Good point; I agree that defaulting to online SGD (single example per
iteration) would be a poor UX due to performance.

On Fri, Aug 7, 2015 at 12:44 PM, Meihua Wu <rotationsymmetry14@gmail.com>
wrote:

> Feynman, thanks for clarifying.
>
> If we default miniBatchFraction = (1 / numInstances), then we will
> only hit one row for every iteration of SGD regardless the number of
> partitions and executors. In other words the parallelism provided by
> the RDD is lost in this approach. I think this is something we need to
> consider for the default value of miniBatchFraction.
>
> On Fri, Aug 7, 2015 at 11:24 AM, Feynman Liang <fliang@databricks.com>
> wrote:
> > Yep, I think that's what Gerald is saying and they are proposing to
> default
> > miniBatchFraction = (1 / numInstances). Is that correct?
> >
> > On Fri, Aug 7, 2015 at 11:16 AM, Meihua Wu <rotationsymmetry14@gmail.com
> >
> > wrote:
> >>
> >> I think in the SGD algorithm, the mini batch sample is done without
> >> replacement. So with fraction=1, then all the rows will be sampled
> >> exactly once to form the miniBatch, resulting to the
> >> deterministic/classical case.
> >>
> >> On Fri, Aug 7, 2015 at 9:05 AM, Feynman Liang <fliang@databricks.com>
> >> wrote:
> >> > Sounds reasonable to me, feel free to create a JIRA (and PR if you're
> up
> >> > for
> >> > it) so we can see what others think!
> >> >
> >> > On Fri, Aug 7, 2015 at 1:45 AM, Gerald Loeffler
> >> > <gerald.loeffler@googlemail.com> wrote:
> >> >>
> >> >> hi,
> >> >>
> >> >> if new LinearRegressionWithSGD() uses a miniBatchFraction of 1.0,
> >> >> doesn’t that make it a deterministic/classical gradient descent
> rather
> >> >> than a SGD?
> >> >>
> >> >> Specifically, miniBatchFraction=1.0 means the entire data set, i.e.
> >> >> all rows. In the spirit of SGD, shouldn’t the default be the fraction
> >> >> that results in exactly one row of the data set?
> >> >>
> >> >> thank you
> >> >> gerald
> >> >>
> >> >> --
> >> >> Gerald Loeffler
> >> >> mailto:gerald.loeffler@googlemail.com
> >> >> http://www.gerald-loeffler.net
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> >> >> For additional commands, e-mail: user-help@spark.apache.org
> >> >>
> >> >
> >
> >
>

Mime
View raw message