spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: How can we control CPU and Memory per Spark job operation..
Date Sat, 16 Jul 2016 22:45:49 GMT
Hi,

My understanding is that these two map functions will end up as a job
with one stage (as if you wrote the two maps as a single map) so you
really need as much vcores and memory as possible for map1 and map2. I
initially thought about dynamic allocation of executors that may or
may not help you with the case, but since there's just one stage I
don't think you can do much.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Jul 15, 2016 at 9:54 PM, Pavan Achanta <pachanta@sysomos.com> wrote:
> Hi All,
>
> Here is my use case:
>
> I have a pipeline job consisting of 2 map functions:
>
> CPU intensive map operation that does not require a lot of memory.
> Memory intensive map operation that requires upto 4 GB of memory. And this
> 4GB memory cannot be distributed since it is an NLP model.
>
> Ideally what I like to do is to use 20 nodes with 4 cores each and minimal
> memory for first map operation and then use only 3 nodes with minimal CPU
> but each having 4GB of memory for 2nd operation.
>
> While it is possible to control this parallelism for each map operation in
> spark. I am not sure how to control the resources for each operation.
> Obviously I don’t want to start off the job with 20 nodes with 4 cores and
> 4GB memory since I cannot afford that much memory.
>
> We use Yarn with Spark. Any suggestions ?
>
> Thanks and regards,
> Pavan
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message