Hey there,

Here’s something I proposed recently that’s in this space. 

It’s motivated by working with a user who wanted to do some custom statistics for which they could write the numpy code, and knew in what dimensions they could parallelize it, but in actually getting it running, the type system really got in the way. 

On Fri, Sep 14, 2018 at 15:15 Holden Karau <holden@pigscanfly.ca> wrote:
Since we're talking about Spark 3.0 in the near future (and since some recent conversation on a proposed change reminded me) I wanted to open up the floor and see if folks have any ideas on how we could make a more Python friendly API for 3.0? I'm planning on taking some time to look at other systems in the solution space and see what we might want to learn from them but I'd love to hear what other folks are thinking too.

Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9