spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jianshi Huang <jianshi.hu...@gmail.com>
Subject Re: Multitenancy in Spark - within/across spark context
Date Thu, 23 Oct 2014 09:56:39 GMT
Upvote for the multitanency requirement.

I'm also building a data analytic platform and there'll be multiple users
running queries and computations simultaneously. One of the paint point is
control of resource size. Users don't really know how much nodes they need,
they always use as much as possible... The result is lots of wasted
resource in our Yarn cluster.

A way to 1) allow multiple spark context to share the same resource or 2)
add dynamic resource management for Yarn mode is very much wanted.

Jianshi

On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin <vanzin@cloudera.com> wrote:

> On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
> <ashwinshankar77@gmail.com> wrote:
> >> That's not something you might want to do usually. In general, a
> >> SparkContext maps to a user application
> >
> > My question was basically this. In this page in the official doc, under
> > "Scheduling within an application" section, it talks about multiuser and
> > fair sharing within an app. How does multiuser within an application
> > work(how users connect to an app,run their stuff) ? When would I want to
> use
> > this ?
>
> I see. The way I read that page is that Spark supports all those
> scheduling options; but Spark doesn't give you the means to actually
> be able to submit jobs from different users to a running SparkContext
> hosted on a different process. For that, you'll need something like
> the job server that I referenced before, or write your own framework
> for supporting that.
>
> Personally, I'd use the information on that page when dealing with
> concurrent jobs in the same SparkContext, but still restricted to the
> same user. I'd avoid trying to create any application where a single
> SparkContext is trying to be shared by multiple users in any way.
>
> >> As far as I understand, this will cause executors to be killed, which
> >> means that Spark will start retrying tasks to rebuild the data that
> >> was held by those executors when needed.
> >
> > I basically wanted to find out if there were any "gotchas" related to
> > preemption on Spark. Things like say half of an application's executors
> got
> > preempted say while doing reduceByKey, will the application progress with
> > the remaining resources/fair share ?
>
> Jobs should still make progress as long as at least one executor is
> available. The gotcha would be the one I mentioned, where Spark will
> fail your job after "x" executors failed, which might be a common
> occurrence when preemption is enabled. That being said, it's a
> configurable option, so you can set "x" to a very large value and your
> job should keep on chugging along.
>
> The options you'd want to take a look at are: spark.task.maxFailures
> and spark.yarn.max.executor.failures
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Mime
View raw message