spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: Questions about Spark standalone resource scheduler
Date Mon, 02 Feb 2015 09:20:07 GMT
Hi Patrick,

Thanks a lot for your detailed explanation. For now we have such requirements: whitelist the
application submitter, user resources (CPU, MEMORY) quotas, resources allocations in Spark
Standalone mode. These are quite specific requirements for production-use, generally these
problem will become whether we need to offer a more advanced resource scheduler compared to
current simple FIFO one. I think our aim is to not provide a general resource scheduler like
Mesos/Yarn, we only support Spark, but we hope to add some Mesos/Yarn functionalities to better
use of Spark standalone mode.

I admitted that resource scheduler may have some overlaps with cloud manager, whether to offer
a powerful scheduler or use cloud manager is really a dilemma.

I think we can break down to some small features to improve the standalone mode. What's your
opinion?

Thanks
Jerry

-----Original Message-----
From: Patrick Wendell [mailto:pwendell@gmail.com] 
Sent: Monday, February 2, 2015 4:49 PM
To: Shao, Saisai
Cc: dev@spark.apache.org; user@spark.apache.org
Subject: Re: Questions about Spark standalone resource scheduler

Hey Jerry,

I think standalone mode will still add more features over time, but the goal isn't really
for it to become equivalent to what Mesos/YARN are today. Or at least, I doubt Spark Standalone
will ever attempt to manage _other_ frameworks outside of Spark and become a general purpose
resource manager.

In terms of having better support for multi tenancy, meaning multiple
*Spark* instances, this is something I think could be in scope in the future. For instance,
we added H/A to the standalone scheduler a while back, because it let us support H/A streaming
apps in a totally native way. It's a trade off of adding new features and keeping the scheduler
very simple and easy to use. We've tended to bias towards simplicity as the main goal, since
this is something we want to be really easy "out of the box".

One thing to point out, a lot of people use the standalone mode with some coarser grained
scheduler, such as running in a cloud service. In this case they really just want a simple
"inner" cluster manager. This may even be the majority of all Spark installations. This is
slightly different than Hadoop environments, where they might just want nice integration into
the existing Hadoop stack via something like YARN.

- Patrick

On Mon, Feb 2, 2015 at 12:24 AM, Shao, Saisai <saisai.shao@intel.com> wrote:
> Hi all,
>
>
>
> I have some questions about the future development of Spark's 
> standalone resource scheduler. We've heard some users have the 
> requirements to have multi-tenant support in standalone mode, like 
> multi-user management, resource management and isolation, whitelist of 
> users. Seems current Spark standalone do not support such kind of 
> functionalities, while resource schedulers like Yarn offers such kind 
> of advanced managements, I'm not sure what's the future target of 
> standalone resource scheduler, will it only target on simple 
> implementation, and for advanced usage shift to YARN? Or will it plan to add some simple
multi-tenant related functionalities?
>
>
>
> Thanks a lot for your comments.
>
>
>
> BR
>
> Jerry

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message