spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Grisel <olivier.gri...@ensta.org>
Subject Re: [Scikit-learn-general] Spark-backed implementations of scikit-learn estimators
Date Wed, 27 Nov 2013 08:34:21 GMT
>013/11/27 Nick Pentreath <nick.pentreath@gmail.com>:
> CC'ing Spark Dev list
>
> I have been thinking about this for quite a while and would really love to
> see this happen.
>
> Most of my pipeline ends up in Scala/Spark these days - which I love, but it
> is partly because I am reliant on custom Hadoop input formats that are just
> way easier to use from Scala/Java - but I still use Python a lot for data
> analysis and interactive work. There is some good stuff happening with
> Breeze in Scala and MLlib in Spark (and IScala) but the breadth just doesn't
> compare as yet - not to mention IPython and plotting!
>
> There is a PR that was just merged into PySpark to allow arbitrary
> serialization protocols between the Java and Python layers. I hope to try to
> use this to allow PySpark users to pull data from arbitrary Hadoop
> InputFormats with minimum fuss. This I believe will open the way for many
> (including me!) to use PySpark directly for virtually all distributed data
> processing without "needing" to use Java
> (https://github.com/apache/incubator-spark/pull/146)
> (http://mail-archives.apache.org/mod_mbox/incubator-spark-dev/201311.mbox/browser).

This is very interesting, thanks for the heads up.


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Mime
View raw message