flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2053) Preregister ML types for Kryo serialization
Date Mon, 25 May 2015 22:43:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558559#comment-14558559

ASF GitHub Bot commented on FLINK-2053:

GitHub user tillrohrmann opened a pull request:


    [FLINK-2053] [ml] Adds automatic preregistration of ML types

    Adds automatic type registration of flink-ml types. This is done by providing a type registration
method `FlinkMLTools.registerFlinkMLTypes` which is called from within the `fit`, `predict`
and `transform` methods of the `Estimator`, `Predictor` and `Transformer`.
    Adds de-duplication of registered types at the `ExecutionConfig` by using `LinkedHashSet`
which maintains the insertion order. 
    Fixes bug in `BreezeSparseVector` to `FlinkSparseVector` conversion. `BreezeSparseVector`
is not always compacted to its maximum and thus leaves some array entries unused. Consequently,
only parts of the data arrays should be given to the `FlinkSparseVector`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink preregisterMLTypes

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #723
commit 483caef1276c80f60bcc6c97836c8008d62ec72b
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-05-25T22:35:05Z

    [FLINK-2053] [ml] Adds automatic type registration of flink-ml types. Adds de-duplication
of registered types at ExecutionConfig. Fixes bug in Breeze SparseVector to Flink SparseVector


> Preregister ML types for Kryo serialization
> -------------------------------------------
>                 Key: FLINK-2053
>                 URL: https://issues.apache.org/jira/browse/FLINK-2053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>              Labels: ML
>             Fix For: 0.9
> Currently, FlinkML uses interfaces and abstract types to implement generic algorithms.
As a consequence we have to use Kryo to serialize the effective subtypes. In order to speed
the data transfer up, it's necessary to preregister these types in order to assign them fixed

This message was sent by Atlassian JIRA

View raw message