spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: RDD: Execution and Scheduling
Date Mon, 21 Sep 2015 01:01:34 GMT
On Sun, Sep 20, 2015 at 3:58 PM, gsvic <victorasgs@gmail.com> wrote:

> Concerning answers 1 and 2:
>
> 1) How Spark determines a node as a "slow node" and how slow is that?
>

There are two cases here:

1. If a node is busy (e.g. all slots are already occupied), the scheduler
cannot schedule anything on it. See "Delay Scheduling: A Simple Technique
for Achieving
Locality and Fairness in Cluster Scheduling" paper for how locality
scheduling is done.

2. Within the same stage, if a task is slower than other tasks, a copy of
it can be launched speculatively in order to mitigate stragglers. Search
for speculation in the code base to find out more.



> 2) How an RDD chooses a location as a preferred location and with which
> criteria?
>

This is part of the RDD definition. The RDD interface itself defines
locality. The Spark NSDI paper already talks about this.

Why don't you just do a little bit of code reading yourself?



>
> Could you please also include the links of the source files for the two
> questions above?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Execution-and-Scheduling-tp14177p14226.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Mime
View raw message