flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gaborhermann <...@git.apache.org>
Subject [GitHub] flink pull request #2819: [FLINK-4961] [ml] SGD for Matrix Factorization (WI...
Date Wed, 16 Nov 2016 15:00:41 GMT
GitHub user gaborhermann opened a pull request:

    https://github.com/apache/flink/pull/2819

    [FLINK-4961] [ml] SGD for Matrix Factorization (WIP)

    Please note, that this is a work-in-progress PR, to discuss some design questions. There
are minor things to be done including the documentation (Scala docs are done). Apart from
these and the questions worth discussing the PR is ready.
    
    Some notes:
    - Generalized matrix factorization methods into `MatrixFactorization` abstract class (this
slightly modifies `ALS`).
    - The algorithm could be executed in parts with `MLTools.persist`, just like in `ALS`
(to use less memory).
    - The algorithm uses random block ID initialization, and shuffles also the data when doing
the updates. However, the algorithm can be made deterministic by setting a seed.
    - The objective function is simply squared loss with L2 regularization in contrast to
`ALS`s weighted-lambda-regularization. This could be extended later to use other regularization
methods too, as SGD is more flexible in terms of loss functions.
    - The same methods could be used for dynamically changing the learning rate as in the
`GradientDescent` implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborhermann/flink dsgd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2819
    
----
commit 88fffbf86e7a2ac8b1adc459e01e084ab2492e07
Author: Daniel Abram <abram.daniel@hotmail.com>
Date:   2016-11-16T13:34:51Z

    [FLINK-4961] SGD for Matrix Factorization

commit 9bd6f2ea4a4fec2e7f4c64cf2b14453f3ba91e48
Author: Gábor Hermann <code@gaborhermann.com>
Date:   2016-11-16T13:35:10Z

    [FLINK-4961] SGD for Matrix Factorization test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message