samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com.INVALID>
Subject Re: Samza Memory Usage on YARN
Date Thu, 18 Sep 2014 20:53:20 GMT
Hey Lukas,

> In my case for some simple tasks that just process stateless messages,
>the AM container is taking essentially half of the resources - one
>container for the AM and one for the task instances, which again seem to
>request >1GB of virtual memory.

Yes, this is just something that we live with. An interesting JIRA might
be to consolidate the AM and SamzaContainer into just one container in
cases where you don't want to waste resources, but we haven't bothered
with this thus far.

Cheers,
Chris

On 9/18/14 1:49 PM, "Lukas Steiblys" <lukas@doubledutch.me> wrote:

>Ok, I have underestimated how much the AM is doing.
>
>>  All of our containers run with at least 1G, and the AM becomes
>>completely 
>> negligible compared to the total amount of resources a job uses.
>
>In my case for some simple tasks that just process stateless messages,
>the 
>AM container is taking essentially half of the resources - one container
>for 
>the AM and one for the task instances, which again seem to request >1GB
>of 
>virtual memory. Shouldn't the task container be a little more lightweight
>then?
>
>Thank you for the detailed explanation. Maybe my problem size is not big
>enough and it will all make sense when we have to process orders of
>magnitude more data.
>
>Lukas
>
>
>-----Original Message-----
>From: Chris Riccomini
>Sent: Thursday, September 18, 2014 1:31 PM
>To: dev@samza.incubator.apache.org
>Subject: Re: Samza Memory Usage on YARN
>
>Hey Lukas,
>
>As a pre-amble, I have to say, if you consider 200MB of memory usage an
>incredibly large amount of memory, you're probably either working with the
>wrong system, or worrying about optimizing the wrong thing. Your
>SamzaContainers are likely not going to be able to run without a few
>hundred megabytes of space. All of our containers run with at least 1G,
>and the AM becomes completely negligible compared to the total amount of
>resources a job uses.
>
>The default for the AM and the SamzaContainer are both:
>
>  -Xmx768M
>  1000MB containers
>
>This means that YARN will kill your process (AM or SamzaContainer) if it
>goes over the 1G limit, and a container will OOME if it goes over 768MB of
>heap usage.
>
>First, I'll address the AM's heap. There are two main reasons why we want
>a 768MB heap.
>
>* The AM runs a Scalatra webapp, which requires significant heap when it
>runs. We tried other -Xmx settings, but 768 seemed to be the lowest stable
>setting for all jobs.
>* Samza's core code is implemented in Scala, which can bloat the JVM. A
>quick glance shows about 12% of heap used for random scala.reflect
>classes.
>
>The 1G container limit (vs. 768MB heap) is to give the AM extra space for
>things like:
>
>* perm gen
>* off-heap space
>* page cache
>* thread stacks
>
>> Is this behavior common or could this be some misconfiguration?
>
>It is common. I took a look at some of our jobs. They're running between
>150MB and 250MB in steady state. When I load the AM webpage, the heap
>spikes up to ~300MB.
>
>> As I understand, one of the problems is that each container has it¹s own
>>VM instance and has to load all the libraries. Could there be some other
>>issues?
>
>There is a little bit of inefficiency from this, but it should be
>negligible. The 200MB of heap usage that you're seeing are actual objects
>being used by the AM. Don't forget that the AM is running a YARN client, a
>web service, a MetricsReporter, etc.
>
>If you're unhappy with the amount of memory that the AM is taking up, the
>first thing that you can do is to tune these two settings:
>
>  yarn.am.opts (to set -Xmx)
>  yarn.am.container.memory.mb (to lower YARN container memory mb)
>
>
>You can experiment to see how low you can get the heap and container
>settings.
>
>Cheers,
>Chris
>
>On 9/18/14 10:13 AM, "Lukas Steiblys" <lukas@doubledutch.me> wrote:
>
>>Hello,
>>
>>I¹m trying to use Samza for our new data processing pipeline using YARN
>>for job scheduling and I¹ve noticed that it consumes an incredibly large
>>amount of memory. Running the Application Master, that should be a very
>>lightweight application in my opinion, consumes around ~1.4GB of virtual
>>memory and ~200MB of physical memory. Same goes for the actual tasks.
>>
>>Is this behavior common or could this be some misconfiguration? As I
>>understand, one of the problems is that each container has it¹s own VM
>>instance and has to load all the libraries. Could there be some other
>>issues? Maybe it¹s possible to actually split the application master
>>package from the task package so it¹s more lightweight?
>>
>>Lukas
>

Mime
View raw message