spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Imran Rashid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-2387) Remove the stage barrier for better resource utilization
Date Thu, 07 Feb 2019 15:28:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-2387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Imran Rashid updated SPARK-2387:
--------------------------------
    Component/s: Scheduler

> Remove the stage barrier for better resource utilization
> --------------------------------------------------------
>
>                 Key: SPARK-2387
>                 URL: https://issues.apache.org/jira/browse/SPARK-2387
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler, Spark Core
>            Reporter: Rui Li
>            Priority: Major
>
> DAGScheduler divides a Spark job into multiple stages according to RDD dependencies.
Whenever there’s a shuffle dependency, DAGScheduler creates a shuffle map stage on the map
side, and another stage depending on that stage.
> Currently, the downstream stage cannot start until all its depended stages have finished.
This barrier between stages leads to idle slots when waiting for the last few upstream tasks
to finish and thus wasting cluster resources.
> Therefore we propose to remove the barrier and pre-start the reduce stage once there're
free slots. This can achieve better resource utilization and improve the overall job performance,
especially when there're lots of executors granted to the application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message