spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongjoon Hyun (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-24874) Allow hybrid of both barrier tasks and regular tasks in a stage
Date Mon, 16 Mar 2020 22:52:08 GMT

     [ https://issues.apache.org/jira/browse/SPARK-24874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dongjoon Hyun updated SPARK-24874:
----------------------------------
    Affects Version/s:     (was: 3.0.0)
                       3.1.0

> Allow hybrid of both barrier tasks and regular tasks in a stage
> ---------------------------------------------------------------
>
>                 Key: SPARK-24874
>                 URL: https://issues.apache.org/jira/browse/SPARK-24874
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.1.0
>            Reporter: Xingbo Jiang
>            Priority: Major
>
> Currently we only allow barrier tasks in a barrier stage, however, consider the following
query:
> {code}
> sc = new SparkContext(conf)
> val rdd1 = sc.parallelize(1 to 100, 10)
> val rdd2 = sc.parallelize(1 to 1000, 20).barrier().mapPartitions((it, ctx) => it)
> val rdd = rdd1.union(rdd2).mapPartitions(t => t)
> {code}
> Now it requires 30 free slots to run `rdd.collect()`. Actually, we can launch regular
tasks to collect data from rdd1's partitions, they are not required to be launched together.
If we can do that, we only need 20 free slots to run `rdd.collect()`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message