spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath" <nick.pentre...@gmail.com>
Subject Re: Spark clustered client
Date Wed, 23 Jul 2014 06:27:02 GMT
At the moment your best bet for sharing SparkContexts across jobs will be Ooyala job server: https://github.com/ooyala/spark-jobserver


It doesn't yet support spark 1.0 though I did manage to amend it to get it to build and run
on 1.0
—
Sent from Mailbox

On Wed, Jul 23, 2014 at 1:21 AM, Asaf Lahav <asaf.lahav@gmail.com> wrote:

> Hi Folks,
> I have been trying to dig up some information in regards to what are the
> possibilities when wanting to deploy more than one client process that
> consumes Spark.
> Let's say I have a Spark Cluster of 10 servers, and would like to setup 2
> additional servers which are sending requests to it through a Spark
> context, referencing one specific file of 1TB of data.
> Each client process, has its own SparkContext instance.
> Currently, the result is that that same file is loaded into memory twice
> because the Spark Context resources are not shared between processes/jvms.
> I wouldn't like to have that same file loaded over and over again with
> every new client being introduced.
> What would be the best practice here? Am I missing something?
> Thank you,
> Asaf
Mime
View raw message