Have a look at scheduling pools. If you want more sophisticated resource allocation, then you are better of to use cluster managers like mesos or yarn 

Best Regards

On Mon, Nov 3, 2014 at 9:10 PM, Romi Kuntsman <romi@totango.com> wrote:

I have a Spark 1.1.0 standalone cluster, with several nodes, and several jobs (applications) being scheduled at the same time.
By default, each Spark job takes up all available CPUs.
This way, when more than one job is scheduled, all but the first are stuck in "WAITING".
On the other hand, if I tell each job to initially limit itself to a fixed number of CPUs, and that job runs by itself, the cluster is under-utilized and the job runs longer than it could have if it took all the available resources.

- How to give the tasks a more fair resource division, which lets many jobs run together, and together lets them use all the available resources?
- How do you divide resources between applications on your usecase?

P.S. I started reading about Mesos but couldn't figure out if/how it could solve the described issue.


Romi Kuntsman, Big Data Engineer