spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prabhu Joseph <>
Subject Re: Spark Scheduler creating Straggler Node
Date Wed, 09 Mar 2016 05:52:18 GMT
I don't just want to replicate all Cached Blocks. I am trying to find a way
to solve the issue which i mentioned above mail. Having replicas for all
cached blocks will add more cost to customers.

On Wed, Mar 9, 2016 at 9:50 AM, Reynold Xin <> wrote:

> You just want to be able to replicate hot cached blocks right?
> On Tuesday, March 8, 2016, Prabhu Joseph <>
> wrote:
>> Hi All,
>>     When a Spark Job is running, and one of the Spark Executor on Node A
>> has some partitions cached. Later for some other stage, Scheduler tries to
>> assign a task to Node A to process a cached partition (PROCESS_LOCAL). But
>> meanwhile the Node A is occupied with some other
>> tasks and got busy. Scheduler waits for spark.locality.wait interval and
>> times out and tries to find some other node B which is NODE_LOCAL. The
>> executor on Node B will try to get the cached partition from Node A which
>> adds network IO to node and also some extra CPU for I/O. Eventually,
>> every node will have a task that is waiting to fetch some cached
>> partition from node A and so the spark job / cluster is basically blocked
>> on a single node.
>> Spark JIRA is created
>> Beginning from Spark 1.2, Spark introduced External Shuffle Service to
>> enable executors fetch shuffle files from an external service instead of
>> from each other which will offload the load on Spark Executors.
>> We want to check whether a similar thing of an External Service is
>> implemented for transferring the cached partition to other executors.
>> Thanks, Prabhu Joseph

View raw message