spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Bradley <>
Subject Re: Evaluation Metrics for Spark's MLlib
Date Thu, 11 Dec 2014 23:23:50 GMT
Hi, I'd recommend starting by checking out the existing helper
functionality for these tasks.  There are helper methods to do K-fold
cross-validation in MLUtils:

The experimental API in the Spark 1.2 release (in branch-1.2 and
master) has a CrossValidator class which does this more automatically:

There are also a few evaluation metrics implemented:

There definitely could be more metrics and/or better APIs to make it easier
to evaluate models on RDDs.  If you spot such cases, I'd recommend opening
up JIRAs for the new features or improvements to get some feedback before
sending PRs:

Hope this helps & looking forward to the contributions!

On Thu, Dec 11, 2014 at 4:41 AM, kidynamit <> wrote:

> Hi,
> I would like to contribute to Spark's Machine Learning library by adding
> evaluation metrics that would be used to gauge the accuracy of a model
> given
> a certain features' set. In particular, I seek to contribute the k-fold
> validation metrics, f-beta metric among others on top of the current MLlib
> framework available.
> Please assist in steps I could take to contribute in this manner.
> Regards,
> kidynamit
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message