spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Or <and...@databricks.com>
Subject Re: PySpark Client
Date Tue, 20 Jan 2015 18:34:01 GMT
Hi Chris,

Short answer is no, not yet.

Longer answer is that PySpark only supports client mode, which means your
driver runs on the same machine as your submission client. By corollary
this means your submission client must currently depend on all of Spark and
its dependencies. There is a patch that supports this for *cluster* mode
(as opposed to client mode), which would be the first step towards what you
want.

-Andrew

2015-01-20 8:36 GMT-08:00 Chris Beavers <cbeavers@trifacta.com>:

> Hey all,
>
> Is there any notion of a lightweight python client for submitting jobs to
> a Spark cluster remotely? If I essentially install Spark on the client
> machine, and that machine has the same OS, same version of Python, etc.,
> then I'm able to communicate with the cluster just fine. But if Python
> versions differ slightly, then I start to see a lot of opaque errors that
> often bubble up as EOFExceptions. Furthermore, this just seems like a very
> heavy weight way to set up a client.
>
> Does anyone have any suggestions for setting up a thin pyspark client on a
> node which doesn't necessarily conform to the homogeneity of the target
> Spark cluster?
>
> Best,
> Chris
>

Mime
View raw message