spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilya Matiach (JIRA)" <>
Subject [jira] [Commented] (SPARK-19053) Supporting multiple evaluation metrics in DataFrame-based API: discussion
Date Mon, 09 Jan 2017 17:46:58 GMT


Ilya Matiach commented on SPARK-19053:

The only problem with the change proposed is that there still needs to be a way to show what
the metric to optimize is - in which case maybe it is better to keep the evaluators and the
metrics/summary classes separate, as Joseph wrote above.

Another problem to thing about is how to give users instance based metrics (metric per row,
which spark currently does not have), metrics like the confusion matrix, and the dataset based
metrics (accuracy/precision etc).  They all have a different type and it would probably be
easier to keep the summary api-style to support them.

> Supporting multiple evaluation metrics in DataFrame-based API: discussion
> -------------------------------------------------------------------------
>                 Key: SPARK-19053
>                 URL:
>             Project: Spark
>          Issue Type: Brainstorming
>          Components: ML
>            Reporter: Joseph K. Bradley
> This JIRA is to discuss supporting the computation of multiple evaluation metrics efficiently
in the DataFrame-based API for MLlib.
> In the RDD-based API, RegressionMetrics and other *Metrics classes support efficient
computation of multiple metrics.
> In the DataFrame-based API, there are a few options:
> * model/result summaries (e.g., LogisticRegressionSummary): These currently provide the
desired functionality, but they require a model and do not let users compute metrics manually
from DataFrames of predictions and true labels.
> * Evaluator classes (e.g., RegressionEvaluator): These only support computing a single
metric in one pass over the data, but they do not require a model.
> * new class analogous to Metrics: We could introduce a class analogous to Metrics.  Model/result
summaries could use this internally as a replacement for spark.mllib Metrics classes, or they
could (maybe) inherit from these classes.
> Thoughts?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message