hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George P. Stathis" <gstat...@traackr.com>
Subject Re: High OS Load Numbers when idle
Date Wed, 18 Aug 2010 00:28:05 GMT
OK, high load averages while idle is a confirmed Lucid issue:
https://bugs.launchpad.net/ubuntu-on-ec2/+bug/574910

Will downgrade back to 9.10 and try again. Thanks for helping out
Ryan. If you hadn't confirmed that this is not normal, I would have
wasted a lot more time looking in the wrong places. Now I just have to
rebuild us a new cluster...

-GS

On Tue, Aug 17, 2010 at 7:39 PM, George P. Stathis <gstathis@traackr.com> wrote:
> Yeap. Load average definitely goes back down to 0.10 or below once
> hadoop/hbase is shut off. Same issue is happening on the slaves as
> well BTW. We are running on Ubuntu 10.04 which is the only recent
> change other than moving to a cluster setup. We were running a
> pseudo-distributed setup before on one machine running Ubuntu 9.10.
> Top shows just as much activity under %id but load average was close
> to zero when idle! Environment is hypervisor'ed as well, so I suspect
> that's not really the issue...:-) Know anyone running on an Ubuntu
> 10.04 setup?
>
> -GS
>
> On Tue, Aug 17, 2010 at 7:08 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>> This is odd, the run queue has quite a bit of processes in it, but the
>> actual cpu data says 'nothings happening'.
>>
>> Can't tell which processes are in the run queue here, does the load
>> average go down when you take hadoop/hbase down?
>>
>> If so, that seems to identify the locus of the problem, but doesn't
>> really explain WHY this is so.  Consider I've never really seen this
>> before I'd just write it off to using a hypervisor'ed environment and
>> blame other people stealing all your cpus :-)
>>
>> On Tue, Aug 17, 2010 at 4:03 PM, George P. Stathis <gstathis@traackr.com> wrote:
>>> vmstat 2 for 2 mins below. Looks like everything is in idle (github
>>> gist paste if it's easier to read: http://gist.github.com/532512):
>>>
>>>
>>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in  
cs us sy id wa
>>>  0  0      0 15097116 248428 1398444    0    0     0    50    5
  24
>>> 0  0 100  0
>>>  0  0      0 15096948 248428 1398444    0    0     0     0  281
 281
>>> 0  0 100  0
>>>  0  0      0 15096948 248428 1398444    0    0     0     0  279
 260
>>> 0  0 100  0
>>>  0  0      0 15096948 248428 1398444    0    0     0     0  199
 216
>>> 0  0 100  0
>>>  4  0      0 15096612 248428 1398448    0    0     0     0  528
 467
>>> 0  0 100  0
>>>  0  0      0 15096612 248428 1398448    0    0     0     0  208
 213
>>> 0  0 100  0
>>>  4  0      0 15096460 248428 1398448    0    0     0     0  251
 261
>>> 0  0 100  0
>>>  0  0      0 15096460 248428 1398448    0    0     0    12  242
 248
>>> 0  0 100  0
>>>  0  0      0 15096460 248428 1398448    0    0     0    34  228
 230
>>> 0  0 100  0
>>>  0  0      0 15096476 248428 1398448    0    0     0     0  266
 272
>>> 0  0 100  0
>>>  0  0      0 15096324 248428 1398448    0    0     0    10  179
 206
>>> 0  0 100  0
>>>  1  0      0 15096340 248428 1398448    0    0     0     0  225
 254
>>> 0  0 100  0
>>>  1  0      0 15096188 248428 1398448    0    0     0     0  263
 245
>>> 0  0 100  0
>>>  0  0      0 15096188 248428 1398448    0    0     0     2  169
 210
>>> 0  0 100  0
>>>  0  0      0 15096188 248428 1398448    0    0     0     0  201
 238
>>> 0  0 100  0
>>>  0  0      0 15096036 248428 1398448    0    0     0    10  174
 202
>>> 0  0 100  0
>>>  0  0      0 15096036 248428 1398448    0    0     0     6  207
 222
>>> 0  0 100  0
>>>  0  0      0 15095884 248428 1398448    0    0     0     0  198
 242
>>> 0  0 100  0
>>>  2  0      0 15095884 248428 1398448    0    0     0     0  177
 215
>>> 0  0 100  0
>>>  0  0      0 15095884 248428 1398448    0    0     0     0  244
 265
>>> 0  0 100  0
>>>  0  0      0 15095732 248428 1398448    0    0     0     4  197
 222
>>> 0  0 100  0
>>>  6  0      0 15095732 248428 1398448    0    0     0     6  267
 260
>>> 0  0 100  0
>>>  0  0      0 15095732 248428 1398448    0    0     0     0  240
 239
>>> 0  0 100  0
>>>  0  0      0 15095580 248428 1398448    0    0     0     8  180
 210
>>> 0  0 100  0
>>>  5  0      0 15095580 248428 1398448    0    0     0     0  193
 224
>>> 0  0 100  0
>>>  1  0      0 15095580 248428 1398448    0    0     0    36  161
 191
>>> 0  0 100  0
>>>  0  0      0 15095428 248428 1398448    0    0     0     0  176
 216
>>> 0  0 100  0
>>>  4  0      0 15095428 248428 1398448    0    0     0     0  202
 236
>>> 0  0 100  0
>>>  0  0      0 15095428 248428 1398448    0    0     0     6  191
 220
>>> 0  0 100  0
>>>  1  0      0 15095428 248428 1398448    0    0     0     0  188
 238
>>> 0  0 100  0
>>>  2  0      0 15095276 248428 1398448    0    0     0     0  174
 206
>>> 0  0 100  0
>>>  1  0      0 15095276 248428 1398448    0    0     0     0  225
 249
>>> 0  0 100  0
>>>  0  0      0 15095124 248428 1398448    0    0     0     0  222
 263
>>> 0  0 100  0
>>>  1  0      0 15095124 248428 1398448    0    0     0     6  187
 236
>>> 0  0 100  0
>>>  5  0      0 15094940 248428 1398448    0    0     0     0  453
 434
>>> 0  0 100  0
>>>  4  0      0 15094788 248428 1398448    0    0     0     2  227
 225
>>> 0  0 100  0
>>>  0  0      0 15094788 248428 1398448    0    0     0     0  213
 236
>>> 0  0 100  0
>>>  4  0      0 15094788 248428 1398448    0    0     0     6  257
 253
>>> 0  0 100  0
>>>  0  0      0 15094636 248428 1398448    0    0     0     0  215
 230
>>> 0  0 100  0
>>>  0  0      0 15094652 248428 1398448    0    0     0     0  259
 285
>>> 0  0 100  0
>>>  0  0      0 15094500 248428 1398448    0    0     0    14  194
 219
>>> 0  0 100  0
>>>  0  0      0 15094516 248428 1398448    0    0     0     0  227
 257
>>> 0  0 100  0
>>>  4  0      0 15094516 248428 1398448    0    0     0    36  266
 263
>>> 0  0 100  0
>>>  1  0      0 15094516 248428 1398448    0    0     0     0  202
 213
>>> 0  0 100  0
>>>  0  0      0 15094364 248428 1398448    0    0     0     0  204
 240
>>> 0  0 100  0
>>>  0  0      0 15094212 248428 1398448    0    0     0     6  161
 194
>>> 0  0 100  0
>>>  0  0      0 15094212 248428 1398448    0    0     0     0  191
 215
>>> 0  0 100  0
>>>  1  0      0 15094212 248428 1398448    0    0     0     0  216
 238
>>> 0  0 100  0
>>>  5  0      0 15094212 248428 1398448    0    0     0     6  169
 202
>>> 0  0 100  0
>>>  0  0      0 15094060 248428 1398448    0    0     0     0  172
 216
>>> 0  0 100  0
>>>  2  0      0 15094060 248428 1398448    0    0     0     6  201
 196
>>> 0  0 100  0
>>>  1  0      0 15094060 248428 1398448    0    0     0     0  196
 218
>>> 0  0 100  0
>>> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>>>  r  b   swpd   free   buff  cache   si   so    bi    bo   in  
cs us sy id wa
>>>  0  0      0 15093908 248428 1398448    0    0     0     0  206
 236
>>> 0  0 100  0
>>>  4  0      0 15093908 248428 1398448    0    0     0     0  197
 219
>>> 0  0 100  0
>>>  0  0      0 15093908 248428 1398448    0    0     0     0  186
 227
>>> 0  0 100  0
>>>  0  0      0 15093756 248428 1398448    0    0     0     0  168
 182
>>> 0  0 100  0
>>>  0  0      0 15093756 248428 1398448    0    0     0     0  206
 239
>>> 0  0 100  0
>>>  0  0      0 15093604 248428 1398448    0    0     0     6  281
 248
>>> 0  0 100  0
>>>  0  0      0 15093452 248428 1398448    0    0     0     0  185
 198
>>> 0  0 100  0
>>>  5  0      0 15093452 248428 1398448    0    0     0     0  265
 253
>>> 0  0 100  0
>>>  0  0      0 15093300 248428 1398448    0    0     0    36  194
 211
>>> 0  0 100  0
>>>  0  0      0 15093300 248428 1398448    0    0     0     0  228
 242
>>> 0  0 100  0
>>>  0  0      0 15093300 248428 1398448    0    0     0     0  290
 262
>>> 0  0 100  0
>>>  0  0      0 15093300 248428 1398448    0    0     0     6  187
 207
>>> 0  0 100  0
>>>
>>>
>>> On Tue, Aug 17, 2010 at 6:54 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>>>> what does vmstat say?  Run it like 'vmstat 2' for a minute or two and
>>>> paste the results.
>>>>
>>>> With no cpu being consumed by java, it seems like there must be
>>>> another hidden variable here.  Some zombied process perhaps. Or some
>>>> kind of super IO wait or something else.
>>>>
>>>> Since you are running on a hypervisor environment, i cant really say
>>>> what is happening to your instance, although one would think the LA
>>>> numbers would be unaffected by outside processes.
>>>>
>>>> On Tue, Aug 17, 2010 at 3:49 PM, George P. Stathis <gstathis@traackr.com>
wrote:
>>>>> Actually, there is nothing in %wa but a ton sitting in %id. This is
>>>>> from the Master:
>>>>>
>>>>> top - 18:30:24 up 5 days, 20:10,  1 user,  load average: 2.55, 1.99,
1.25
>>>>> Tasks:  89 total,   1 running,  88 sleeping,   0 stopped,   0 zombie
>>>>> Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.8%id,  0.0%wa,  0.0%hi,  0.0%si,
 0.2%st
>>>>> Mem:  17920228k total,  2795464k used, 15124764k free,   248428k buffers
>>>>> Swap:        0k total,        0k used,        0k free,  1398388k
cached
>>>>>
>>>>> I have atop installed which is reporting the hadoop/hbase java daemons
>>>>> as the most active processes (barely taking any CPU time though and
>>>>> most of the time in sleep mode):
>>>>>
>>>>> ATOP - domU-12-31-39-18-1 2010/08/17  18:31:46              
10 seconds elapsed
>>>>> PRC | sys   0.01s | user   0.00s | #proc     89 | #zombie    0
| #exit      0 |
>>>>> CPU | sys      0% | user      0% | irq       0% | idle    200%
| wait      0% |
>>>>> cpu | sys      0% | user      0% | irq       0% | idle    100%
| cpu000 w  0% |
>>>>> CPL | avg1   2.55 | avg5    2.12 | avg15   1.35 | csw     2397
| intr    2034 |
>>>>> MEM | tot   17.1G | free   14.4G | cache   1.3G | buff  242.6M |
slab  193.1M |
>>>>> SWP | tot    0.0M | free    0.0M |              | vmcom  
1.6G | vmlim   8.5G |
>>>>> NET | transport   | tcpi     330 | tcpo     169 | udpi     566
| udpo     147 |
>>>>> NET | network     | ipi      896 | ipo      316 | ipfrw    
 0 | deliv    896 |
>>>>> NET | eth0   ---- | pcki     777 | pcko     197 | si  248 Kbps
| so   70 Kbps |
>>>>> NET | lo     ---- | pcki     119 | pcko     119 | si    9 Kbps
| so    9 Kbps |
>>>>>
>>>>>  PID  CPU COMMAND-LINE                              
                   1/1
>>>>> 17613   0% atop
>>>>> 17150   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
>>>>> 16527   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>>>>> 16839   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>>>>> 16735   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>>>>> 17083   0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
>>>>>
>>>>> Same with atop:
>>>>>
>>>>>  PID USER     PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
>>>>> 16527 ubuntu    20   0 2352M   98M 10336 S  0.0  0.6  0:42.05
>>>>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
>>>>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
>>>>> -Dhadoop.log.dir=/var/log/h
>>>>> 16735 ubuntu    20   0 2403M 81544 10236 S  0.0  0.5  0:01.56
>>>>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
>>>>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
>>>>> -Dhadoop.log.dir=/var/log/h
>>>>> 17083 ubuntu    20   0 4557M 45388 10912 S  0.0  0.3  0:00.65
>>>>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode -server -XX:+Heap
>>>>>    1 root      20   0 23684  1880  1272 S  0.0  0.0  0:00.23
/sbin/init
>>>>>  587 root      20   0  247M  4088  2432 S  0.0  0.0 -596523h-14:-8
>>>>> /usr/sbin/console-kit-daemon --no-daemon
>>>>>  3336 root      20   0 49256  1092   540 S  0.0  0.0  0:00.36
/usr/sbin/sshd
>>>>> 16430 nobody    20   0 34408  3704  1060 S  0.0  0.0  0:00.01
gmond
>>>>> 17150 ubuntu    20   0 2519M  112M 11312 S  0.0  0.6 -596523h-14:-8
>>>>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
>>>>> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>>>>> -XX:+CMSIncrementalMode -server -XX
>>>>>
>>>>>
>>>>> So I'm a bit perplexed. Are there any hadoop / hbase specific tricks
>>>>> that can reveal what these processes are doing?
>>>>>
>>>>> -GS
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <jdcryans@apache.org>
wrote:
>>>>>>
>>>>>> It's not normal, but then again I don't have access to your machines
>>>>>> so I can only speculate.
>>>>>>
>>>>>> Does "top" show you which process is in %wa? If so and it's a java
>>>>>> process, can you figure what's going on in there?
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <gstathis@gmail.com>
wrote:
>>>>>> > Hello,
>>>>>> >
>>>>>> > We have just setup a new cluster on EC2 using Hadoop 0.20.2
and HBase
>>>>>> > 0.20.3. Our small setup as of right now consists of one master
and four
>>>>>> > slaves with a replication factor of 2:
>>>>>> >
>>>>>> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1
namenode, 1
>>>>>> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper
(uses its' own
>>>>>> > dedicated EMS drive)
>>>>>> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run
1 datanode, 1
>>>>>> > tasktracker, 1 regionserver
>>>>>> >
>>>>>> > We have also installed Ganglia to monitor the cluster stats
as we are about
>>>>>> > to run some performance tests but, right out of the box, we
are noticing
>>>>>> > high system loads (especially on the master node) without any
activity
>>>>>> > happening on the clister. Of course, the CPUs are not being
utilized at all,
>>>>>> > but Ganglia is reporting almost all nodes in the red as the
1, 5 an 15
>>>>>> > minute load times are all above 100% most of the time (i.e.
there are more
>>>>>> > than two processes at a time competing for the 2 CPUs time).
>>>>>> >
>>>>>> > Question1: is this normal?
>>>>>> > Question2: does it matter since each process barely uses any
of the CPU
>>>>>> > time?
>>>>>> >
>>>>>> > Thank you in advance and pardon the noob questions.
>>>>>> >
>>>>>> > -GS
>>>>>> >
>>>>>
>>>>
>>>
>>
>

Mime
View raw message