samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Steiblys" <lu...@doubledutch.me>
Subject Re: Samza Memory Usage on YARN
Date Thu, 18 Sep 2014 20:49:46 GMT
Ok, I have underestimated how much the AM is doing.

>  All of our containers run with at least 1G, and the AM becomes completely 
> negligible compared to the total amount of resources a job uses.

In my case for some simple tasks that just process stateless messages, the 
AM container is taking essentially half of the resources - one container for 
the AM and one for the task instances, which again seem to request >1GB of 
virtual memory. Shouldn't the task container be a little more lightweight 
then?

Thank you for the detailed explanation. Maybe my problem size is not big 
enough and it will all make sense when we have to process orders of 
magnitude more data.

Lukas


-----Original Message----- 
From: Chris Riccomini
Sent: Thursday, September 18, 2014 1:31 PM
To: dev@samza.incubator.apache.org
Subject: Re: Samza Memory Usage on YARN

Hey Lukas,

As a pre-amble, I have to say, if you consider 200MB of memory usage an
incredibly large amount of memory, you're probably either working with the
wrong system, or worrying about optimizing the wrong thing. Your
SamzaContainers are likely not going to be able to run without a few
hundred megabytes of space. All of our containers run with at least 1G,
and the AM becomes completely negligible compared to the total amount of
resources a job uses.

The default for the AM and the SamzaContainer are both:

  -Xmx768M
  1000MB containers

This means that YARN will kill your process (AM or SamzaContainer) if it
goes over the 1G limit, and a container will OOME if it goes over 768MB of
heap usage.

First, I'll address the AM's heap. There are two main reasons why we want
a 768MB heap.

* The AM runs a Scalatra webapp, which requires significant heap when it
runs. We tried other -Xmx settings, but 768 seemed to be the lowest stable
setting for all jobs.
* Samza's core code is implemented in Scala, which can bloat the JVM. A
quick glance shows about 12% of heap used for random scala.reflect classes.

The 1G container limit (vs. 768MB heap) is to give the AM extra space for
things like:

* perm gen
* off-heap space
* page cache
* thread stacks

> Is this behavior common or could this be some misconfiguration?

It is common. I took a look at some of our jobs. They're running between
150MB and 250MB in steady state. When I load the AM webpage, the heap
spikes up to ~300MB.

> As I understand, one of the problems is that each container has it¹s own
>VM instance and has to load all the libraries. Could there be some other
>issues?

There is a little bit of inefficiency from this, but it should be
negligible. The 200MB of heap usage that you're seeing are actual objects
being used by the AM. Don't forget that the AM is running a YARN client, a
web service, a MetricsReporter, etc.

If you're unhappy with the amount of memory that the AM is taking up, the
first thing that you can do is to tune these two settings:

  yarn.am.opts (to set -Xmx)
  yarn.am.container.memory.mb (to lower YARN container memory mb)


You can experiment to see how low you can get the heap and container
settings.

Cheers,
Chris

On 9/18/14 10:13 AM, "Lukas Steiblys" <lukas@doubledutch.me> wrote:

>Hello,
>
>I¹m trying to use Samza for our new data processing pipeline using YARN
>for job scheduling and I¹ve noticed that it consumes an incredibly large
>amount of memory. Running the Application Master, that should be a very
>lightweight application in my opinion, consumes around ~1.4GB of virtual
>memory and ~200MB of physical memory. Same goes for the actual tasks.
>
>Is this behavior common or could this be some misconfiguration? As I
>understand, one of the problems is that each container has it¹s own VM
>instance and has to load all the libraries. Could there be some other
>issues? Maybe it¹s possible to actually split the application master
>package from the task package so it¹s more lightweight?
>
>Lukas


Mime
View raw message