hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: JobTracker memory usage peaks once a day and OOM sometimes
Date Tue, 08 Feb 2011 20:46:52 GMT
Hi Maxim,

I thought I responded to this question already a few weeks ago - maybe not

Looking at the heap usage of a Java process using default garbage collectors
is always misleading. In particular, unless you are using the concurrent
mark and sweep (CMS) GC, a collection won't begin until the old generation
is actually full. So, you will see a pattern of the heap filling up and then
dropping back down, like you're describing.

You can hook up JConsole to your daemon and hit the "GC" button to see how
much actual live data you've got.

In general I'd agree with Allen's assessment that you're probably just
holding too many tasks in RAM. How many jobs are generally queued or running
at a time, and how many tasks do each of those jobs contain? If you're in
the hundreds of thousands there, it's probably just that you need more heap
allotted to the JT.


On Tue, Feb 8, 2011 at 12:35 PM, Maxim Zizin <mzizin@iponweb.net> wrote:

> Allen,
> Thanks for your answer.
> Re: handful of jobs -- That was our first thought. But we looked at the
> logs and found nothing strange. Moreover after JT's restart the time the
> peaks start shifted. When we restarted it one more time it shifted again. In
> all cases first peak after restart starts in ~24 hours since restart. So
> this seems to be some scheduled daily thing or something and does not depend
> on the jobs we run.
> Re: heap size -- We have a cluster of 12 slaves. 2GB seems to be enough as
> it uses ~1GB normally and ~1.5GB during peaks. Although we're going to
> increase JT's heap size up to 3GB tomorrow. This will at least give us more
> time to pause crons and restart JT until it goes out of heap space next
> time. Or am I wrong when I think that the fact that our JT uses 1-1.5 GB
> means that 2GB of heap is enough?
> On 2/8/2011 11:16 PM, Allen Wittenauer wrote:
>> On Feb 8, 2011, at 8:59 AM, Maxim Zizin wrote:
>>  Hi all,
>>> We monitor JT, NN and SNN memory usage and observe the following behavior
>>> in our Hadoop cluster. JT's heap size is set to 2000m. About 18 hours a day
>>> it uses ~1GB but every day roughly at the minute it was started its used
>>> memory increases to ~1.5GB and then decreases back to ~1GB in about 6 hours.
>>> Sometimes this takes a bit more than 6 hours, sometimes a bit less. I was
>>> wondering whether anyone here knows what JT does once a day that makes it
>>> use 1.5 times more memory than normally.
>>> We're so interested in JT memory usage because during last two weeks we
>>> twice had JT getting out of heap space. Both times right after those daily
>>> used memory peaks when it was going down from 1.5GB to 1GB it started
>>> increasing again until got stuck at ~2.2GB. After that it becomes
>>> unresponsive and we have to restart it.
>>> We're using Cloudera's CDH2 version 0.20.1+169.113.
>>        Who knows what is happening in the CDH release?
>>        But in the normal job tracker, keep in mind that memory is consumed
>> by every individual task listed on the main page.  If you have some jobs
>> that have extremely high task counts or a lot of counters or really long
>> names or ..., then that is likely your problem.  Chances are good you have a
>> handful of jobs that are bad citizens that are getting scrolled off the page
>> at the same time every day.
>>        Also, for any grid of any significant size, 2g of heap is way too
>> small.
> --
> Regards, Max

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message