flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2157) Create evaluation framework for ML library
Date Wed, 08 Jul 2015 12:39:05 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14618518#comment-14618518
] 

ASF GitHub Bot commented on FLINK-2157:
---------------------------------------

Github user thvasilo commented on a diff in the pull request:

    https://github.com/apache/flink/pull/871#discussion_r34142293
  
    --- Diff: flink-staging/flink-ml/src/main/scala/org/apache/flink/ml/pipeline/Predictor.scala
---
    @@ -72,12 +74,36 @@ trait Predictor[Self] extends Estimator[Self] with WithParameters
{
         */
       def evaluate[Testing, PredictionValue](
           testing: DataSet[Testing],
    -      evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit
    -      evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
    +      evaluateParameters: ParameterMap = ParameterMap.Empty)
    +      (implicit evaluator: EvaluateDataSetOperation[Self, Testing, PredictionValue])
         : DataSet[(PredictionValue, PredictionValue)] = {
         FlinkMLTools.registerFlinkMLTypes(testing.getExecutionEnvironment)
         evaluator.evaluateDataSet(this, evaluateParameters, testing)
       }
    +
    +  /** Calculates a numerical score for the [[Predictor]]
    +    *
    +    * By convention, higher scores are considered better, so even if a loss is used as
a performance
    +    * measure, it will be negated, so that that higher is better.
    +    * @param testing The evaluation DataSet, that contains the features and the true
value
    +    * @param evaluateOperation An EvaluateDataSetOperation that produces Double results
    +    * @tparam Testing The type of the features and true value, for example [[LabeledVector]]
    +    * @return A DataSet containing one Double that indicates the score of the predictor
    +    */
    +  def score[Testing](testing: DataSet[Testing])
    --- End diff --
    
    That is true, the assumption I'm making here is that Predictors are either Classifiers
or Regressors. For classifiers, strings used as classes would be first translated to numerical
representations (by the user or automatically), as it is my assumption currently that the
canonical way to use a classifier is to train it with a `DataSet[LabeledVector]`, which has
numerical class labels.
    
    This can of course become problematic if in the future we decide to implement multi-label
classification algorithms.
    
    The other option is to try generalize calculateScore to take `DataSet[(PredictionT, PredictionT)]`,
which I think would mean that we have to generalize most of the Score implementations as well.
    
    Personally I think the current approach covers a majority of our use cases, and we can
deal with its limitations as problems come along.


> Create evaluation framework for ML library
> ------------------------------------------
>
>                 Key: FLINK-2157
>                 URL: https://issues.apache.org/jira/browse/FLINK-2157
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Theodore Vasiloudis
>              Labels: ML
>             Fix For: 0.10
>
>
> Currently, FlinkML lacks means to evaluate the performance of trained models. It would
be great to add some {{Evaluators}} which can calculate some score based on the information
about true and predicted labels. This could also be used for the cross validation to choose
the right hyper parameters.
> Possible scores could be F score [1], zero-one-loss score, etc.
> Resources
> [1] [http://en.wikipedia.org/wiki/F1_score]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message