spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yana <>
Subject RE: Don't understand "schedule jobs within an Application
Date Mon, 01 Jun 2015 13:12:49 GMT
1. Yes if two tasks depend on each other they cant parallelize
2. Imagine something like a web application driver. You only get to have 1 spark context but
now you want to run many concurrent jobs. They have nothing 2 do with each other; no reason
to keep them sequential. 

Hope this helps

<div>-------- Original message --------</div><div>From:
</div><div>Date:06/01/2015  4:14 AM  (GMT-05:00) </div><div>To: user
<> </div><div>Subject: Don't understand "schedule jobs
within an Application </div><div>
</div>Hi, sparks,

Following is copied from the spark online document

Basically, I have two questions on it:

1. If two jobs in an application has dependencies, that is one job depends on the result of
the other job, then I think they will have to run sequentially.
2. Since jobs scheduling happens within one application, I don't think job scheduing will
give benefits to  multi-users as the last sentence  my opinion, multi users can benifit
only from cross applications scheduling.

Maybe i haven't had a good understanding on the job scheduing, could someone elaborate this?
Thanks very much

By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages”
(e.g. map and reduce phases), and the first job gets priority on all available resources while
its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the
head of the queue don’t need to use the whole cluster, later jobs can start to run right
away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.
Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair
sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs
get a roughly equal share of cluster resources. This means that short jobs submitted while
a long job is running can start receiving resources right away and still get good response
times, without waiting for the long job to finish. This mode is best for multi-user settings
View raw message