spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Freeman <>
Subject Re: Spark on an HPC setup
Date Thu, 29 May 2014 04:37:54 GMT
Hi Sid,

We are successfully running Spark on an HPC, it works great. Here's info on our setup / approach.

We have a cluster with 256 nodes running Scientific Linux 6.3 and scheduled by Univa Grid
Engine.  The environment also has a DDN GridScalar running GPFS and several EMC Isilon clusters
serving NFS to the compute cluster.

We wrote a custom qsub job to spin up Spark dynamically on a user-designated quantity of nodes.
The UGE scheduler first designates a set of nodes that will be used to run Spark. Once the
nodes are available, we use script to launch a master, and send it the addresses
of the other nodes. The master then starts the workers with At that point, the
Spark cluster is usable and remains active until the user issues a qdel, which triggers the on the master, and takes down the cluster. 

This worked well for us because users can pick the number of nodes to suit their job, and
multiple users can run their own Spark clusters on the same system (alongside other non-Spark

We don't use HDFS for the filesystem, instead relying on NFS and GPFS, and the cluster is
not running Hadoop. In tests, we've seen similar performance between our set up, and using
Spark w/ HDFS on EC2 with higher-end instances (matched roughly for memory and number of cores).

Unfortunately we can't open source the launched scripts because they contain proprietary UGE
stuff, but happy to try and answer any follow-up questions.

-- Jeremy

Jeremy Freeman, PhD

On May 28, 2014, at 11:02 AM, Sidharth Kashyap <> wrote:

> Hi,
> Has anyone tried to get Spark working on an HPC setup?
> If yes, can you please share your learnings and how you went about doing it?
> An HPC setup typically comes bundled with dynamically allocated cluster and a very efficient
> Configuring Spark standalone in this mode of operation is challenging as the Hadoop dependencies
need to be eliminated and the cluster needs to be configured on the fly.
> Thanks,
> Sid

View raw message