spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Staple (JIRA)" <>
Subject [jira] [Commented] (SPARK-1503) Implement Nesterov's accelerated first-order method
Date Fri, 06 Mar 2015 20:03:38 GMT


Aaron Staple commented on SPARK-1503:

I recently created a PR for an implementation of accelerated gradient descent without backtracking,
as discussed above.

I also ran some simple, small scale (single server) benchmarks to compare different optimization
methods. (The benchmark result graphs are provided as images attached to this ticket.) The
optimization methods tested were:

- gra: existing gradient descent implementation (using full batch)
- acc: accelerated descent (as implemented in the PR), but without automatic restart
- acc_r: accelerated descent, with automatic restart
- acc_b: accelerated descent, with backtracking, without automatic restart
- acc_rb: accelerated descent, with backtracking, with automatic restart
- lbfgs: existing lbfgs implementation

(Note that backtracking is not part of the PR, and was just tested for.)

The x axis shows the number of outer loop iterations of the optimization algorithm. (Note
that for backtracking implementations, the full cost of backtracking is not represented in
this outer loop count. For non backtracking implementations, the number of outer loop iterations
is the same as the number of spark map reduce jobs). The y axis is the log of the difference
from best determined optimized value.

The optimization test runs were:

- linear: A scaled up version of the test data from TFOCS’s test_LASSO.m example, with 10000
observations on 1024 features. 512 of the features are actually correlated with result. Unregularized
linear regression was used. (The scala acceleration implementation was observed to be consistent
with the TFOCS implementation on this dataset.)
- linear l1: The same as ‘linear’, but with l1 regularization
- logistic: Each feature of each observation is generated by summing a feature gaussian specific
to the observation’s binary category with a noise gaussian. 10000 observations on 250 features.
Unregularized logistic regression was used.
- logistic l2: Same as ‘logistic’, but using l2 regularization

Note that for each test run, all optimization methods were given the same initial step size.

- Acceleration consistently converges more quickly than standard gradient descent, given the
same initial step size.
- Automatic restart is helpful for acceleration convergence
- Backtracking can significantly boost convergence rates in some cases (measured in terms
of outer loop iterations). But the full cost of backtracking was not measured in these runs.
- lbfgs generally outperformed accelerated gradient descent in these test runs. Accelerated
gradient descent was competitive with lbfgs for linear l1 (lasso) regression. However as noted
in the documentation, the L1Updater “will not work” for the lbfgs implementation. It seems
l1 regularization is currently a weak spot for the lbfgs implementation.

> Implement Nesterov's accelerated first-order method
> ---------------------------------------------------
>                 Key: SPARK-1503
>                 URL:
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Aaron Staple
> Nesterov's accelerated first-order method is a drop-in replacement for steepest descent
but it converges much faster. We should implement this method and compare its performance
with existing algorithms, including SGD and L-BFGS.
> TFOCS ( is a reference implementation of Nesterov's method and
its variants on composite objectives.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message