spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bit1129@163.com" <bit1...@163.com>
Subject Don't understand "schedule jobs within an Application
Date Mon, 01 Jun 2015 08:14:39 GMT
Hi, sparks,

Following is copied from the spark online document http://spark.apache.org/docs/latest/job-scheduling.html.


Basically, I have two questions on it:

1. If two jobs in an application has dependencies, that is one job depends on the result of
the other job, then I think they will have to run sequentially.
2. Since jobs scheduling happens within one application, I don't think job scheduing will
give benefits to  multi-users as the last sentence says.in  my opinion, multi users can benifit
only from cross applications scheduling.

Maybe i haven't had a good understanding on the job scheduing, could someone elaborate this?
Thanks very much






By default, Spark’s scheduler runs jobs in FIFO fashion. Each job is divided into “stages”
(e.g. map and reduce phases), and the first job gets priority on all available resources while
its stages have tasks to launch, then the second job gets priority, etc. If the jobs at the
head of the queue don’t need to use the whole cluster, later jobs can start to run right
away, but if the jobs at the head of the queue are large, then later jobs may be delayed significantly.
Starting in Spark 0.8, it is also possible to configure fair sharing between jobs. Under fair
sharing, Spark assigns tasks between jobs in a “round robin” fashion, so that all jobs
get a roughly equal share of cluster resources. This means that short jobs submitted while
a long job is running can start receiving resources right away and still get good response
times, without waiting for the long job to finish. This mode is best for multi-user settings




bit1129@163.com
Mime
View raw message