spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: Multitenancy in Spark - within/across spark context
Date Wed, 22 Oct 2014 21:36:00 GMT
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
<ashwinshankar77@gmail.com> wrote:
>> That's not something you might want to do usually. In general, a
>> SparkContext maps to a user application
>
> My question was basically this. In this page in the official doc, under
> "Scheduling within an application" section, it talks about multiuser and
> fair sharing within an app. How does multiuser within an application
> work(how users connect to an app,run their stuff) ? When would I want to use
> this ?

I see. The way I read that page is that Spark supports all those
scheduling options; but Spark doesn't give you the means to actually
be able to submit jobs from different users to a running SparkContext
hosted on a different process. For that, you'll need something like
the job server that I referenced before, or write your own framework
for supporting that.

Personally, I'd use the information on that page when dealing with
concurrent jobs in the same SparkContext, but still restricted to the
same user. I'd avoid trying to create any application where a single
SparkContext is trying to be shared by multiple users in any way.

>> As far as I understand, this will cause executors to be killed, which
>> means that Spark will start retrying tasks to rebuild the data that
>> was held by those executors when needed.
>
> I basically wanted to find out if there were any "gotchas" related to
> preemption on Spark. Things like say half of an application's executors got
> preempted say while doing reduceByKey, will the application progress with
> the remaining resources/fair share ?

Jobs should still make progress as long as at least one executor is
available. The gotcha would be the one I mentioned, where Spark will
fail your job after "x" executors failed, which might be a common
occurrence when preemption is enabled. That being said, it's a
configurable option, so you can set "x" to a very large value and your
job should keep on chugging along.

The options you'd want to take a look at are: spark.task.maxFailures
and spark.yarn.max.executor.failures

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message