flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2050) Add pipelining mechanism for chainable transformers and estimators
Date Wed, 20 May 2015 23:39:01 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553351#comment-14553351

ASF GitHub Bot commented on FLINK-2050:

GitHub user tillrohrmann opened a pull request:


    [FLINK-2050] Introduces new pipelining mechanism for FlinkML

    This PR introduces the new pipelining mechanism for FlinkML. In order to make pipeline
applicable to different input types, the algorithm logic and the state of the pipeline operator
have been separated. The logic is now kept in implicit values which are automatically selected
by the Scala compiler based on the input and output types of the pipeline operators and the
input data.
    The operator itself keeps now the model data which is trained in the fit phase. Thus,
there is no longer a distinct model which is returned from the algorithm.
    The pipelining allows, for example, a pipeline which scales vectors to work on the `Vector`
type as well as `LabeledVector` type even though both types are not related. The only requirement
is that implicit values implementing the algorithm are available. This approach is similar
to the mechanism which can be found in the Breeze library.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink pipeline

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #704
commit 4e5118b10cb7525e19147d49a5fdc6da3aae639c
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-05-05T13:04:32Z

    [FLINK-2050] [ml] Introduces new pipelining mechanism using implicit classes to wrap the
algorithm logic

commit da7d0bfe3a0780b386fcb9b0640513c32ee7bbab
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-05-20T11:49:52Z

    [FLINK-2050] [ml] Ports existing ML algorithms to new pipeline mechanism
    Adds pipeline comments
    Adds pipeline IT case


> Add pipelining mechanism for chainable transformers and estimators
> ------------------------------------------------------------------
>                 Key: FLINK-2050
>                 URL: https://issues.apache.org/jira/browse/FLINK-2050
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>              Labels: ML
>             Fix For: 0.9
> The key concept of an easy to use ML library is the quick and simple construction of
data analysis pipelines. Scikit-learn's approach to define transformers and estimators seems
to be a really good solution to this problem. I propose to follow a similar path, because
it makes FlinkML flexible in terms of code reuse as well as easy for people coming from Scikit-learn
to use the FlinkML.

This message was sent by Atlassian JIRA

View raw message