spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Hunter <>
Subject Introducing spark-sklearn, a scikit-learn integration package for Spark
Date Wed, 10 Feb 2016 17:13:59 GMT
Hello community,
Joseph and I would like to introduce a new Spark package that should
be useful for python users that depend on scikit-learn.

Among other tools:
 - train and evaluate multiple scikit-learn models in parallel.
 - convert Spark's Dataframes seamlessly into numpy arrays
 - (experimental) distribute Scipy's sparse matrices as a dataset of
sparse vectors.

Spark-sklearn focuses on problems that have a small amount of data and
that can be run in parallel. Note this package distributes simple
tasks like grid-search cross-validation. It does not distribute
individual learning algorithms (unlike Spark MLlib).

If you want to use it, see instructions on the package page:

This blog post contains more details:

Let us know if you have any questions. Also, documentation or code
contributions are much welcome (Apache 2.0 license).



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message