spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Hunter <timhun...@databricks.com>
Subject Introducing spark-sklearn, a scikit-learn integration package for Spark
Date Wed, 10 Feb 2016 17:13:59 GMT
Hello community,
Joseph and I would like to introduce a new Spark package that should
be useful for python users that depend on scikit-learn.

Among other tools:
 - train and evaluate multiple scikit-learn models in parallel.
 - convert Spark's Dataframes seamlessly into numpy arrays
 - (experimental) distribute Scipy's sparse matrices as a dataset of
sparse vectors.

Spark-sklearn focuses on problems that have a small amount of data and
that can be run in parallel. Note this package distributes simple
tasks like grid-search cross-validation. It does not distribute
individual learning algorithms (unlike Spark MLlib).

If you want to use it, see instructions on the package page:
https://github.com/databricks/spark-sklearn

This blog post contains more details:
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html

Let us know if you have any questions. Also, documentation or code
contributions are much welcome (Apache 2.0 license).

Cheers

Tim

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message