spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kunft <>
Subject Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions
Date Sun, 01 Aug 2021 17:27:50 GMT

@Sean: Since Spark 3.x, stage level resource scheduling is available:

@Gourav: I'm using the latest version of Spark 3.1.2. I want to split the
two maps on different executors, as both the GPU function and the CPU
function take quite some time,
so it would be great to have element n being processed in the GPU function
while n + 1 is already computed in the CPU function. As a workaround, I
write the results of the
CPU task to a queue which is consumed by another job that executes the CPU

Do you have any idea, if resource assignment based scheduling for functions
is a planned feature for the future?


On Sun, Aug 1, 2021 at 6:53 PM Gourav Sengupta <>

> Hi Andreas,
> just to understand the question first, what is it you want to achieve by
> breaking the map operations across the GPU and CPU?
> Also it will be wonderful to understand the version of SPARK you are
> using, and your GPU details a bit more.
> Regards,
> Gourav
> On Sat, Jul 31, 2021 at 9:57 AM Andreas Kunft <>
> wrote:
>> I have a setup with two work intensive tasks, one map using GPU followed
>> by a map using only CPU.
>> Using stage level resource scheduling, I request a GPU node, but would
>> also like to execute the consecutive CPU map on a different executor so
>> that the GPU node is not blocked.
>> However, spark will always combine the two maps due to the narrow
>> dependency, and thus, I can not define two different resource requirements.
>> So the question is: can I force the two map functions on different
>> executors without shuffling or even better is there a plan to enable this
>> by assigning different resource requirements.
>> Best

View raw message