spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject Re: How can we control CPU and Memory per Spark job operation..
Date Sat, 16 Jul 2016 22:45:49 GMT

My understanding is that these two map functions will end up as a job
with one stage (as if you wrote the two maps as a single map) so you
really need as much vcores and memory as possible for map1 and map2. I
initially thought about dynamic allocation of executors that may or
may not help you with the case, but since there's just one stage I
don't think you can do much.

Jacek Laskowski
Mastering Apache Spark
Follow me at

On Fri, Jul 15, 2016 at 9:54 PM, Pavan Achanta <> wrote:
> Hi All,
> Here is my use case:
> I have a pipeline job consisting of 2 map functions:
> CPU intensive map operation that does not require a lot of memory.
> Memory intensive map operation that requires upto 4 GB of memory. And this
> 4GB memory cannot be distributed since it is an NLP model.
> Ideally what I like to do is to use 20 nodes with 4 cores each and minimal
> memory for first map operation and then use only 3 nodes with minimal CPU
> but each having 4GB of memory for 2nd operation.
> While it is possible to control this parallelism for each map operation in
> spark. I am not sure how to control the resources for each operation.
> Obviously I don’t want to start off the job with 20 nodes with 4 cores and
> 4GB memory since I cannot afford that much memory.
> We use Yarn with Spark. Any suggestions ?
> Thanks and regards,
> Pavan

To unsubscribe e-mail:

View raw message