Hi Chris,

Short answer is no, not yet.

Longer answer is that PySpark only supports client mode, which means your driver runs on the same machine as your submission client. By corollary this means your submission client must currently depend on all of Spark and its dependencies. There is a patch that supports this for cluster mode (as opposed to client mode), which would be the first step towards what you want.


2015-01-20 8:36 GMT-08:00 Chris Beavers <cbeavers@trifacta.com>:
Hey all,

Is there any notion of a lightweight python client for submitting jobs to a Spark cluster remotely? If I essentially install Spark on the client machine, and that machine has the same OS, same version of Python, etc., then I'm able to communicate with the cluster just fine. But if Python versions differ slightly, then I start to see a lot of opaque errors that often bubble up as EOFExceptions. Furthermore, this just seems like a very heavy weight way to set up a client.

Does anyone have any suggestions for setting up a thin pyspark client on a node which doesn't necessarily conform to the homogeneity of the target Spark cluster?