spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <>
Subject [jira] [Created] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion
Date Mon, 02 Jan 2017 20:25:58 GMT
Joseph K. Bradley created SPARK-19053:

             Summary: Supporting multiple evaluation metrics in DataFrame-based API: discussion
                 Key: SPARK-19053
             Project: Spark
          Issue Type: Brainstorming
          Components: ML
            Reporter: Joseph K. Bradley

This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently
in the DataFrame-based API for MLlib.

In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient computation
of multiple metrics.

In the DataFrame-based API, there are a few options:
* model/result summaries (e.g., LogisticRegressionSummary): These currently provide the desired
functionality, but they require a model and do not let users compute metrics manually from
DataFrames of predictions and true labels.
* Evaluator classes (e.g., RegressionEvaluator): These only support computing a single metric
in one pass over the data, but they do not require a model.
* new class analogous to Metrics: We could introduce a class analogous to Metrics.  Model/result
summaries could use this internally as a replacement for spark.mllib Metrics classes, or they
could (maybe) inherit from these classes.


This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message