spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <>
Subject Re: Contributing pyspark ports
Date Sun, 16 Mar 2014 20:21:31 GMT
Unfortunately there isn’t a guide, but you can read a PySpark internals overview at
This would be the thing to follow.

In terms of MLlib and GraphX, I think MLlib will be easier to expose at first — it’s designed
to be easy to call from Java, and we’ve already created bindings for many of the algorithms
that connect with NumPy. (A couple of new algorithms have been added since then though.) GraphX
currently isn’t easy to call from Java and will be even harder to deal with in Python. I’d
start with a Java API for it first.

BTW in both of these we want to call the JVM codebase from Python. That will be a lot more
efficient than implementing the same code in Python, and more maintainable as well.


On Mar 16, 2014, at 5:59 AM, Krakna H <> wrote:

> Is there any documentation on contributing pyspark ports of additions to Spark? I only
see guidelines on Scala contributions (
Specifically, I'm interested in porting mllib and graphx contributions.
> View this message in context: Contributing pyspark ports
> Sent from the Apache Spark User List mailing list archive at

View raw message