flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2053) Preregister ML types for Kryo serialization
Date Mon, 25 May 2015 22:43:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558559#comment-14558559
] 

ASF GitHub Bot commented on FLINK-2053:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/723

    [FLINK-2053] [ml] Adds automatic preregistration of ML types

    Adds automatic type registration of flink-ml types. This is done by providing a type registration
method `FlinkMLTools.registerFlinkMLTypes` which is called from within the `fit`, `predict`
and `transform` methods of the `Estimator`, `Predictor` and `Transformer`.
    
    Adds de-duplication of registered types at the `ExecutionConfig` by using `LinkedHashSet`
which maintains the insertion order. 
    
    Fixes bug in `BreezeSparseVector` to `FlinkSparseVector` conversion. `BreezeSparseVector`
is not always compacted to its maximum and thus leaves some array entries unused. Consequently,
only parts of the data arrays should be given to the `FlinkSparseVector`.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink preregisterMLTypes

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/723.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #723
    
----
commit 483caef1276c80f60bcc6c97836c8008d62ec72b
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-05-25T22:35:05Z

    [FLINK-2053] [ml] Adds automatic type registration of flink-ml types. Adds de-duplication
of registered types at ExecutionConfig. Fixes bug in Breeze SparseVector to Flink SparseVector
conversion.

----


> Preregister ML types for Kryo serialization
> -------------------------------------------
>
>                 Key: FLINK-2053
>                 URL: https://issues.apache.org/jira/browse/FLINK-2053
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>              Labels: ML
>             Fix For: 0.9
>
>
> Currently, FlinkML uses interfaces and abstract types to implement generic algorithms.
As a consequence we have to use Kryo to serialize the effective subtypes. In order to speed
the data transfer up, it's necessary to preregister these types in order to assign them fixed
IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message