spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: RDD: Execution and Scheduling
Date Mon, 21 Sep 2015 01:01:34 GMT
On Sun, Sep 20, 2015 at 3:58 PM, gsvic <> wrote:

> Concerning answers 1 and 2:
> 1) How Spark determines a node as a "slow node" and how slow is that?

There are two cases here:

1. If a node is busy (e.g. all slots are already occupied), the scheduler
cannot schedule anything on it. See "Delay Scheduling: A Simple Technique
for Achieving
Locality and Fairness in Cluster Scheduling" paper for how locality
scheduling is done.

2. Within the same stage, if a task is slower than other tasks, a copy of
it can be launched speculatively in order to mitigate stragglers. Search
for speculation in the code base to find out more.

> 2) How an RDD chooses a location as a preferred location and with which
> criteria?

This is part of the RDD definition. The RDD interface itself defines
locality. The Spark NSDI paper already talks about this.

Why don't you just do a little bit of code reading yourself?

> Could you please also include the links of the source files for the two
> questions above?
> --
> View this message in context:
> Sent from the Apache Spark Developers List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message