spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion
Date Mon, 02 Jan 2017 20:25:58 GMT
Joseph K. Bradley created SPARK-19053:
-----------------------------------------

             Summary: Supporting multiple evaluation metrics in DataFrame-based API: discussion
                 Key: SPARK-19053
                 URL: https://issues.apache.org/jira/browse/SPARK-19053
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
            Reporter: Joseph K. Bradley


This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently
in the DataFrame-based API for MLlib.

In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient computation
of multiple metrics.

In the DataFrame-based API, there are a few options:
* model/result summaries (e.g., LogisticRegressionSummary): These currently provide the desired
functionality, but they require a model and do not let users compute metrics manually from
DataFrames of predictions and true labels.
* Evaluator classes (e.g., RegressionEvaluator): These only support computing a single metric
in one pass over the data, but they do not require a model.
* new class analogous to Metrics: We could introduce a class analogous to Metrics.  Model/result
summaries could use this internally as a replacement for spark.mllib Metrics classes, or they
could (maybe) inherit from these classes.

Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message