spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions
Date Thu, 05 Aug 2021 17:56:25 GMT
Doesn't a persist break stages?

On Thu, Aug 5, 2021, 11:40 AM Tom Graves <tgraves_cs@yahoo.com.invalid>
wrote:

> As Sean mentioned its only available at Stage level but you said you don't
> want to shuffle so splitting into stages doesn't help you.  Without more
> details it seems like you could "hack" this by just requesting an executor
> with 1 GPU (allowing 2 tasks per gpu) and 2 CPUs and the one task would use
> the GPU and the other could just use the CPU.  Perhaps that is to
> simplistic or brittle though.
>
> Tom
> On Saturday, July 31, 2021, 03:56:18 AM CDT, Andreas Kunft <
> andreas.kunft@gmail.com> wrote:
>
>
> I have a setup with two work intensive tasks, one map using GPU followed
> by a map using only CPU.
>
> Using stage level resource scheduling, I request a GPU node, but would
> also like to execute the consecutive CPU map on a different executor so
> that the GPU node is not blocked.
>
> However, spark will always combine the two maps due to the narrow
> dependency, and thus, I can not define two different resource requirements.
>
> So the question is: can I force the two map functions on different
> executors without shuffling or even better is there a plan to enable this
> by assigning different resource requirements.
>
> Best
>

Mime
View raw message