spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Graves <tgraves...@yahoo.com.INVALID>
Subject Re: [Spark Core, PySpark] Separate stage level scheduling for consecutive map functions
Date Thu, 05 Aug 2021 17:39:47 GMT
 As Sean mentioned its only available at Stage level but you said you don't want to shuffle
so splitting into stages doesn't help you.  Without more details it seems like you could
"hack" this by just requesting an executor with 1 GPU (allowing 2 tasks per gpu) and 2 CPUs
and the one task would use the GPU and the other could just use the CPU.  Perhaps that is
to simplistic or brittle though.
Tom    On Saturday, July 31, 2021, 03:56:18 AM CDT, Andreas Kunft <andreas.kunft@gmail.com>
wrote:  
 
 I have a setup with two work intensive tasks, one map using GPU followed by a map using only
CPU.
Using stage level resource scheduling, I request a GPU node, but would also like to execute
the consecutive CPU map on a different executor so that the GPU node is not blocked.
However, spark will always combine the two maps due to the narrow dependency, and thus, I
can not define two different resource requirements.
So the question is: can I force the two map functions on different executors without shuffling
or even better is there a plan to enable this by assigning different resource requirements.
Best  
Mime
View raw message