flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5785) Add an Imputer for preparing data
Date Mon, 27 Mar 2017 13:00:44 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943219#comment-15943219

ASF GitHub Bot commented on FLINK-5785:

GitHub user p4nna opened a pull request:


    [FLINK-5785]  Add an Imputer for preparing data

    Provides an imputer method which adds missing values to a sparse DataSet of vectors. Those
can be filled with the mean, the median or the most frequent value of each row or optionally
column. Like that incomplete data don't have to be thrown away, but rather can be used to
train a machine learning algorithm

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/p4nna/flink ml-Imputer-edits

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3620
commit f2875ac5890564213d5f055d710976d1fede3962
Author: p4nna <beer@dbs.ifi.lmu.de>
Date:   2017-03-27T09:47:39Z

    Add files via upload

commit 8e6909b52dad34d6c4cd6c84618616ac50cd83d1
Author: p4nna <beer@dbs.ifi.lmu.de>
Date:   2017-03-27T09:49:59Z

    Test for Imputer class
    Two testclasses which test the functions implemented in the new imputer class. One for
the rowwise imputing over all vectors and one for the vectorwise imputing

commit 0c420a84c136b330135ce180db04d899b5a6f54c
Author: p4nna <beer@dbs.ifi.lmu.de>
Date:   2017-03-27T09:56:51Z

    removed unused imports and methods


> Add an Imputer for preparing data
> ---------------------------------
>                 Key: FLINK-5785
>                 URL: https://issues.apache.org/jira/browse/FLINK-5785
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Stavros Kontopoulos
>            Assignee: Stavros Kontopoulos
> We need to add an Imputer as described in [1].
> "The Imputer class provides basic strategies for imputing missing values, either using
the mean, the median or the most frequent value of the row or column in which the missing
values are located. This class also allows for different missing values encodings."
> References
> 1. http://scikit-learn.org/stable/modules/preprocessing.html#preprocessing
> 2. http://scikit-learn.org/stable/auto_examples/missing_values.html#sphx-glr-auto-examples-missing-values-py

This message was sent by Atlassian JIRA

View raw message