spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan <velvia.git...@gmail.com>
Subject Re: Multitenancy in Spark - within/across spark context
Date Fri, 24 Oct 2014 04:35:51 GMT
Ashwin,

I would say the strategies in general are:

1) Have each user submit separate Spark app (each its own Spark
Context), with its own resource settings, and share data through HDFS
or something like Tachyon for speed.

2) Share a single spark context amongst multiple users, using fair
scheduler.  This is sort of like having a Hadoop resource pool.    It
has some obvious HA/SPOF issues, namely that if the context dies then
every user using it is also dead.   Also, sharing RDDs in cached
memory has the same resiliency problems, namely that if any executor
dies then Spark must recompute / rebuild the RDD (it tries to only
rebuild the missing part, but sometimes it must rebuild everything).

Job server can help with 1 or 2, 2 in particular.  If you have any
questions about job server, feel free to ask at the spark-jobserver
google group.   I am the maintainer.

-Evan


On Thu, Oct 23, 2014 at 1:06 PM, Marcelo Vanzin <vanzin@cloudera.com> wrote:
> You may want to take a look at https://issues.apache.org/jira/browse/SPARK-3174.
>
> On Thu, Oct 23, 2014 at 2:56 AM, Jianshi Huang <jianshi.huang@gmail.com> wrote:
>> Upvote for the multitanency requirement.
>>
>> I'm also building a data analytic platform and there'll be multiple users
>> running queries and computations simultaneously. One of the paint point is
>> control of resource size. Users don't really know how much nodes they need,
>> they always use as much as possible... The result is lots of wasted resource
>> in our Yarn cluster.
>>
>> A way to 1) allow multiple spark context to share the same resource or 2)
>> add dynamic resource management for Yarn mode is very much wanted.
>>
>> Jianshi
>>
>> On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin <vanzin@cloudera.com> wrote:
>>>
>>> On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
>>> <ashwinshankar77@gmail.com> wrote:
>>> >> That's not something you might want to do usually. In general, a
>>> >> SparkContext maps to a user application
>>> >
>>> > My question was basically this. In this page in the official doc, under
>>> > "Scheduling within an application" section, it talks about multiuser and
>>> > fair sharing within an app. How does multiuser within an application
>>> > work(how users connect to an app,run their stuff) ? When would I want to
>>> > use
>>> > this ?
>>>
>>> I see. The way I read that page is that Spark supports all those
>>> scheduling options; but Spark doesn't give you the means to actually
>>> be able to submit jobs from different users to a running SparkContext
>>> hosted on a different process. For that, you'll need something like
>>> the job server that I referenced before, or write your own framework
>>> for supporting that.
>>>
>>> Personally, I'd use the information on that page when dealing with
>>> concurrent jobs in the same SparkContext, but still restricted to the
>>> same user. I'd avoid trying to create any application where a single
>>> SparkContext is trying to be shared by multiple users in any way.
>>>
>>> >> As far as I understand, this will cause executors to be killed, which
>>> >> means that Spark will start retrying tasks to rebuild the data that
>>> >> was held by those executors when needed.
>>> >
>>> > I basically wanted to find out if there were any "gotchas" related to
>>> > preemption on Spark. Things like say half of an application's executors
>>> > got
>>> > preempted say while doing reduceByKey, will the application progress
>>> > with
>>> > the remaining resources/fair share ?
>>>
>>> Jobs should still make progress as long as at least one executor is
>>> available. The gotcha would be the one I mentioned, where Spark will
>>> fail your job after "x" executors failed, which might be a common
>>> occurrence when preemption is enabled. That being said, it's a
>>> configurable option, so you can set "x" to a very large value and your
>>> job should keep on chugging along.
>>>
>>> The options you'd want to take a look at are: spark.task.maxFailures
>>> and spark.yarn.max.executor.failures
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>
>>
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message