At the moment your best bet for sharing SparkContexts across jobs will be Ooyala job server: https://github.com/ooyala/spark-jobserver

It doesn't yet support spark 1.0 though I did manage to amend it to get it to build and run on 1.0

Sent from Mailbox


On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav <asaf.lahav@gmail.com> wrote:

Hi Folks,

I have been trying to dig up some information in regards to what are the possibilities when wanting to deploy more than one client process that consumes Spark.

Let's say I have a Spark Cluster of 10 servers, and would like to setup 2 additional servers which are sending requests to it through a Spark context, referencing one specific file of 1TB of data.

Each client process, has its own SparkContext instance. 
Currently, the result is that that same file is loaded into memory twice because the Spark Context resources are not shared between processes/jvms.


I wouldn't like to have that same file loaded over and over again with every new client being introduced.
What would be the best practice here? Am I missing something?

Thank you,
Asaf