flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1807) Stochastic gradient descent optimizer for ML library
Date Tue, 21 Apr 2015 09:26:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504664#comment-14504664
] 

ASF GitHub Bot commented on FLINK-1807:
---------------------------------------

GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/613

    [WIP] - [FLINK-1807/1889] - Optimization frame work and initial SGD implementation

    This is a WIP PR for the optimization framework of the Flink ML library.
    
    The design is a mix between how sklearn and Apache Spark implement their learning algorithm
optimization frameworks.
    
    The idea is that a Learner can take a Solver, LossFunction and RegularizationType as parameters,
similar to the design that sklearn uses and Spark seems to be headed to. This allows for flexibility
on how users design their learning algorithms.
    
    A Solver uses the  LossFunction and RegularizationType in order to optimize the weights
according to the provided DataSet of LabeledVector (label, featuresVector).
    
    As you will see in the TODOs there are many questions regarding the design yet, and no
real RegularizationType has been implemented yet so that interface could change depending
on what we end up needing for the regularization calculation.
    
    A first implementation of Stochastic Gradient Descent is included. As you will see, the
stochastic part is still missing as we are blocked on a sample operator for DataSet. Instead
we have to map over the whole data.
    If you run the tests you will see that the third test where we try to perform just one
step of the optimization does not work. I haven't been able to figure out why this happens
yet, any help would be appreciated.
    
    I've also included a wrapper for BLAS functions that was copied directly from MLlib.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 1ed6032b6505488549785ff38b5805586a0465cb
Author: Theodore Vasiloudis <tvas@sics.se>
Date:   2015-04-21T08:59:34Z

    Interfaces for the optimization framework.
    
    BLAS.scala was directly copied from the Apache Spark project.

commit 5a40f14790fd024fdd9a01069262627cda2126a4
Author: Theodore Vasiloudis <tvas@sics.se>
Date:   2015-04-21T09:01:50Z

    Added Stochastic Gradient Descent initial version and some tests.

----


> Stochastic gradient descent optimizer for ML library
> ----------------------------------------------------
>
>                 Key: FLINK-1807
>                 URL: https://issues.apache.org/jira/browse/FLINK-1807
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Theodore Vasiloudis
>              Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in different
ML algorithms. Thus, it would be helpful to provide a generalized SGD implementation which
can be instantiated with the respective gradient computation. Such a building block would
make the development of future algorithms easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message