spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Rodriguez <>
Subject Re: How can we control CPU and Memory per Spark job operation..
Date Sun, 17 Jul 2016 05:18:11 GMT
You could call map on an RDD which has “many” partitions, then call repartition/coalesce
to drastically reduce the number of partitions so that your second map job has less things

Pedro Rodriguez
PhD Student in Large-Scale Machine Learning | CU Boulder
Systems Oriented Data Scientist
UC Berkeley AMPLab Alumni | 909-353-4423 | LinkedIn

On July 16, 2016 at 4:46:04 PM, Jacek Laskowski ( wrote:


My understanding is that these two map functions will end up as a job  
with one stage (as if you wrote the two maps as a single map) so you  
really need as much vcores and memory as possible for map1 and map2. I  
initially thought about dynamic allocation of executors that may or  
may not help you with the case, but since there's just one stage I  
don't think you can do much.  

Jacek Laskowski  
Mastering Apache Spark  
Follow me at  

On Fri, Jul 15, 2016 at 9:54 PM, Pavan Achanta <> wrote:  
> Hi All,  
> Here is my use case:  
> I have a pipeline job consisting of 2 map functions:  
> CPU intensive map operation that does not require a lot of memory.  
> Memory intensive map operation that requires upto 4 GB of memory. And this  
> 4GB memory cannot be distributed since it is an NLP model.  
> Ideally what I like to do is to use 20 nodes with 4 cores each and minimal  
> memory for first map operation and then use only 3 nodes with minimal CPU  
> but each having 4GB of memory for 2nd operation.  
> While it is possible to control this parallelism for each map operation in  
> spark. I am not sure how to control the resources for each operation.  
> Obviously I don’t want to start off the job with 20 nodes with 4 cores and  
> 4GB memory since I cannot afford that much memory.  
> We use Yarn with Spark. Any suggestions ?  
> Thanks and regards,  
> Pavan  

To unsubscribe e-mail:  

View raw message