hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxim Zizin <mzi...@iponweb.net>
Subject Re: JobTracker memory usage peaks once a day and OOM sometimes
Date Tue, 08 Feb 2011 21:14:22 GMT

Thanks for your attention.

Re: a few weeks ago -- I might have missed it. Will look for that thread.

Re: garbage -- I thought that would look more like a progressive raise 
that starts right after the previous drop and ends with another drop 
(like saw). While I see instant raises by 1.5 times, then plateau, then 
instant drop (in ~6 hours), then almost no raise till next instant raise 
in ~18 hours. Sorry If I look dumb by thinking so -- I just don't know 
much about Java and GC. I use streaming and write mappers and reducers 
mostly on Perl.

Re: how many jobs/tasks -- We have roughly a dozen of jobs running at a 
time. Each having ~5 mappers and 1-2 reducers. Although sometimes we 
have jobs with a few hundreds of mappers and a few tens of reducers. But 
they are not daily -- some are hourly, others run four times a day.

On 2/8/2011 11:46 PM, Todd Lipcon wrote:
> Hi Maxim,
> I thought I responded to this question already a few weeks ago - maybe not
> :)
> Looking at the heap usage of a Java process using default garbage collectors
> is always misleading. In particular, unless you are using the concurrent
> mark and sweep (CMS) GC, a collection won't begin until the old generation
> is actually full. So, you will see a pattern of the heap filling up and then
> dropping back down, like you're describing.
> You can hook up JConsole to your daemon and hit the "GC" button to see how
> much actual live data you've got.
> In general I'd agree with Allen's assessment that you're probably just
> holding too many tasks in RAM. How many jobs are generally queued or running
> at a time, and how many tasks do each of those jobs contain? If you're in
> the hundreds of thousands there, it's probably just that you need more heap
> allotted to the JT.
> -Todd
> On Tue, Feb 8, 2011 at 12:35 PM, Maxim Zizin<mzizin@iponweb.net>  wrote:
>> Allen,
>> Thanks for your answer.
>> Re: handful of jobs -- That was our first thought. But we looked at the
>> logs and found nothing strange. Moreover after JT's restart the time the
>> peaks start shifted. When we restarted it one more time it shifted again. In
>> all cases first peak after restart starts in ~24 hours since restart. So
>> this seems to be some scheduled daily thing or something and does not depend
>> on the jobs we run.
>> Re: heap size -- We have a cluster of 12 slaves. 2GB seems to be enough as
>> it uses ~1GB normally and ~1.5GB during peaks. Although we're going to
>> increase JT's heap size up to 3GB tomorrow. This will at least give us more
>> time to pause crons and restart JT until it goes out of heap space next
>> time. Or am I wrong when I think that the fact that our JT uses 1-1.5 GB
>> means that 2GB of heap is enough?
>> On 2/8/2011 11:16 PM, Allen Wittenauer wrote:
>>> On Feb 8, 2011, at 8:59 AM, Maxim Zizin wrote:
>>>   Hi all,
>>>> We monitor JT, NN and SNN memory usage and observe the following behavior
>>>> in our Hadoop cluster. JT's heap size is set to 2000m. About 18 hours a day
>>>> it uses ~1GB but every day roughly at the minute it was started its used
>>>> memory increases to ~1.5GB and then decreases back to ~1GB in about 6 hours.
>>>> Sometimes this takes a bit more than 6 hours, sometimes a bit less. I was
>>>> wondering whether anyone here knows what JT does once a day that makes it
>>>> use 1.5 times more memory than normally.
>>>> We're so interested in JT memory usage because during last two weeks we
>>>> twice had JT getting out of heap space. Both times right after those daily
>>>> used memory peaks when it was going down from 1.5GB to 1GB it started
>>>> increasing again until got stuck at ~2.2GB. After that it becomes
>>>> unresponsive and we have to restart it.
>>>> We're using Cloudera's CDH2 version 0.20.1+169.113.
>>>         Who knows what is happening in the CDH release?
>>>         But in the normal job tracker, keep in mind that memory is consumed
>>> by every individual task listed on the main page.  If you have some jobs
>>> that have extremely high task counts or a lot of counters or really long
>>> names or ..., then that is likely your problem.  Chances are good you have a
>>> handful of jobs that are bad citizens that are getting scrolled off the page
>>> at the same time every day.
>>>         Also, for any grid of any significant size, 2g of heap is way too
>>> small.
>> --
>> Regards, Max

Regards, Max

View raw message