[ https://issues.apache.org/jira/browse/SPARK1543?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
Shuo Xiang updated SPARK1543:

Description:
This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso
(elastic net, in fact) in mllib.
ADMM is capable of solving a class of composite minimization problems in a distributed way.
Specifically for Lasso (if only L1regularization) or elasticnet (both L1 and L2 regularization),
in each iteration, it requires solving independent systems of linear equations on each partition
and a subsequent softthreholding operation on the driver machine. Unlike SGD, it is a deterministic
algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).
The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.
I tried to follow the organization of existing Lasso implementation. However, as ADMM is also
a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may be
worth reorganizing and putting ADMM into a separate section.
was:
This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso
(elastic net, in fact) in mllib.
ADMM is capable of solving a class of composite minimization problems in a distributed way.
Specifically for Lasso (if only L1regularization) or elasticnet (both L1 and L2 regularization),
it requires solving independent systems of linear equations on each partition and a softthreholding
operation on the driver. Unlike SGD, it is a deterministic algorithm (except for the random
partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).
The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.
I tried to follow the organization of existing Lasso implementation. However, as ADMM is also
a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may worth
to reorganize and put ADMM into a separate section.
PR: https://github.com/apache/spark/pull/458
> Add ADMM for solving Lasso (and elastic net) problem
> 
>
> Key: SPARK1543
> URL: https://issues.apache.org/jira/browse/SPARK1543
> Project: Spark
> Issue Type: New Feature
> Reporter: Shuo Xiang
> Priority: Minor
> Labels: features
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving
Lasso (elastic net, in fact) in mllib.
> ADMM is capable of solving a class of composite minimization problems in a distributed
way. Specifically for Lasso (if only L1regularization) or elasticnet (both L1 and L2 regularization),
in each iteration, it requires solving independent systems of linear equations on each partition
and a subsequent softthreholding operation on the driver machine. Unlike SGD, it is a deterministic
algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).
> The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.
> I tried to follow the organization of existing Lasso implementation. However, as ADMM
is also a good fit for similar optimization problems, e.g., (sparse) logistic regression,
it may be worth reorganizing and putting ADMM into a separate section.

This message was sent by Atlassian JIRA
(v6.2#6252)
