spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Kunft <andreas.ku...@gmail.com>
Subject Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions
Date Sun, 01 Aug 2021 17:27:50 GMT
Hi,

@Sean: Since Spark 3.x, stage level resource scheduling is available:
https://databricks.com/session_na21/stage-level-scheduling-improving-big-data-and-ai-integration

@Gourav: I'm using the latest version of Spark 3.1.2. I want to split the
two maps on different executors, as both the GPU function and the CPU
function take quite some time,
so it would be great to have element n being processed in the GPU function
while n + 1 is already computed in the CPU function. As a workaround, I
write the results of the
CPU task to a queue which is consumed by another job that executes the CPU
task.

Do you have any idea, if resource assignment based scheduling for functions
is a planned feature for the future?

Best
Andreas


On Sun, Aug 1, 2021 at 6:53 PM Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Hi Andreas,
>
> just to understand the question first, what is it you want to achieve by
> breaking the map operations across the GPU and CPU?
>
> Also it will be wonderful to understand the version of SPARK you are
> using, and your GPU details a bit more.
>
>
> Regards,
> Gourav
>
> On Sat, Jul 31, 2021 at 9:57 AM Andreas Kunft <andreas.kunft@gmail.com>
> wrote:
>
>> I have a setup with two work intensive tasks, one map using GPU followed
>> by a map using only CPU.
>>
>> Using stage level resource scheduling, I request a GPU node, but would
>> also like to execute the consecutive CPU map on a different executor so
>> that the GPU node is not blocked.
>>
>> However, spark will always combine the two maps due to the narrow
>> dependency, and thus, I can not define two different resource requirements.
>>
>> So the question is: can I force the two map functions on different
>> executors without shuffling or even better is there a plan to enable this
>> by assigning different resource requirements.
>>
>> Best
>>
>

Mime
View raw message