spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shuo Xiang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-1543) Add ADMM for solving Lasso (and elastic net) problem
Date Tue, 22 Apr 2014 02:42:16 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shuo Xiang updated SPARK-1543:
------------------------------

    Description: 
This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso
(elastic net, in fact) in mllib. 

ADMM is capable of solving a class of composite minimization problems in a distributed way.
Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization),
in each iteration, it requires solving independent systems of linear equations on each partition
and a subsequent soft-threholding operation on the driver machine. Unlike SGD, it is a deterministic
algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).

The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.

I tried to follow the organization of existing Lasso implementation. However, as ADMM is also
a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may be
worth reorganizing and putting ADMM into a separate section.


  was:
This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving Lasso
(elastic net, in fact) in mllib. 

ADMM is capable of solving a class of composite minimization problems in a distributed way.
Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization),
it requires solving independent systems of linear equations on each partition and a soft-threholding
operation on the driver. Unlike SGD, it is a deterministic algorithm (except for the random
partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).

The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.

I tried to follow the organization of existing Lasso implementation. However, as ADMM is also
a good fit for similar optimization problems, e.g., (sparse) logistic regression, it may worth
to re-organize and put ADMM into a separate section.

PR: https://github.com/apache/spark/pull/458



> Add ADMM for solving Lasso (and elastic net) problem
> ----------------------------------------------------
>
>                 Key: SPARK-1543
>                 URL: https://issues.apache.org/jira/browse/SPARK-1543
>             Project: Spark
>          Issue Type: New Feature
>            Reporter: Shuo Xiang
>            Priority: Minor
>              Labels: features
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> This PR introduces the Alternating Direction Method of Multipliers (ADMM) for solving
Lasso (elastic net, in fact) in mllib. 
> ADMM is capable of solving a class of composite minimization problems in a distributed
way. Specifically for Lasso (if only L1-regularization) or elastic-net (both L1- and L2- regularization),
in each iteration, it requires solving independent systems of linear equations on each partition
and a subsequent soft-threholding operation on the driver machine. Unlike SGD, it is a deterministic
algorithm (except for the random partition). Details can be found in the [S. Boyd's paper](http://www.stanford.edu/~boyd/papers/admm_distr_stats.html).
> The linear algebra operations mainly rely on the Breeze library, particularly, it applies
`breeze.linalg.cholesky` to perform cholesky decomposition on each partition to solve the
linear system.
> I tried to follow the organization of existing Lasso implementation. However, as ADMM
is also a good fit for similar optimization problems, e.g., (sparse) logistic regression,
it may be worth reorganizing and putting ADMM into a separate section.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message