vmstat 2 for 2 mins below. Looks like everything is in idle (github
gist paste if it's easier to read: http://gist.github.com/532512):
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 15097116 248428 1398444 0 0 0 50 5 24
0 0 100 0
0 0 0 15096948 248428 1398444 0 0 0 0 281 281
0 0 100 0
0 0 0 15096948 248428 1398444 0 0 0 0 279 260
0 0 100 0
0 0 0 15096948 248428 1398444 0 0 0 0 199 216
0 0 100 0
4 0 0 15096612 248428 1398448 0 0 0 0 528 467
0 0 100 0
0 0 0 15096612 248428 1398448 0 0 0 0 208 213
0 0 100 0
4 0 0 15096460 248428 1398448 0 0 0 0 251 261
0 0 100 0
0 0 0 15096460 248428 1398448 0 0 0 12 242 248
0 0 100 0
0 0 0 15096460 248428 1398448 0 0 0 34 228 230
0 0 100 0
0 0 0 15096476 248428 1398448 0 0 0 0 266 272
0 0 100 0
0 0 0 15096324 248428 1398448 0 0 0 10 179 206
0 0 100 0
1 0 0 15096340 248428 1398448 0 0 0 0 225 254
0 0 100 0
1 0 0 15096188 248428 1398448 0 0 0 0 263 245
0 0 100 0
0 0 0 15096188 248428 1398448 0 0 0 2 169 210
0 0 100 0
0 0 0 15096188 248428 1398448 0 0 0 0 201 238
0 0 100 0
0 0 0 15096036 248428 1398448 0 0 0 10 174 202
0 0 100 0
0 0 0 15096036 248428 1398448 0 0 0 6 207 222
0 0 100 0
0 0 0 15095884 248428 1398448 0 0 0 0 198 242
0 0 100 0
2 0 0 15095884 248428 1398448 0 0 0 0 177 215
0 0 100 0
0 0 0 15095884 248428 1398448 0 0 0 0 244 265
0 0 100 0
0 0 0 15095732 248428 1398448 0 0 0 4 197 222
0 0 100 0
6 0 0 15095732 248428 1398448 0 0 0 6 267 260
0 0 100 0
0 0 0 15095732 248428 1398448 0 0 0 0 240 239
0 0 100 0
0 0 0 15095580 248428 1398448 0 0 0 8 180 210
0 0 100 0
5 0 0 15095580 248428 1398448 0 0 0 0 193 224
0 0 100 0
1 0 0 15095580 248428 1398448 0 0 0 36 161 191
0 0 100 0
0 0 0 15095428 248428 1398448 0 0 0 0 176 216
0 0 100 0
4 0 0 15095428 248428 1398448 0 0 0 0 202 236
0 0 100 0
0 0 0 15095428 248428 1398448 0 0 0 6 191 220
0 0 100 0
1 0 0 15095428 248428 1398448 0 0 0 0 188 238
0 0 100 0
2 0 0 15095276 248428 1398448 0 0 0 0 174 206
0 0 100 0
1 0 0 15095276 248428 1398448 0 0 0 0 225 249
0 0 100 0
0 0 0 15095124 248428 1398448 0 0 0 0 222 263
0 0 100 0
1 0 0 15095124 248428 1398448 0 0 0 6 187 236
0 0 100 0
5 0 0 15094940 248428 1398448 0 0 0 0 453 434
0 0 100 0
4 0 0 15094788 248428 1398448 0 0 0 2 227 225
0 0 100 0
0 0 0 15094788 248428 1398448 0 0 0 0 213 236
0 0 100 0
4 0 0 15094788 248428 1398448 0 0 0 6 257 253
0 0 100 0
0 0 0 15094636 248428 1398448 0 0 0 0 215 230
0 0 100 0
0 0 0 15094652 248428 1398448 0 0 0 0 259 285
0 0 100 0
0 0 0 15094500 248428 1398448 0 0 0 14 194 219
0 0 100 0
0 0 0 15094516 248428 1398448 0 0 0 0 227 257
0 0 100 0
4 0 0 15094516 248428 1398448 0 0 0 36 266 263
0 0 100 0
1 0 0 15094516 248428 1398448 0 0 0 0 202 213
0 0 100 0
0 0 0 15094364 248428 1398448 0 0 0 0 204 240
0 0 100 0
0 0 0 15094212 248428 1398448 0 0 0 6 161 194
0 0 100 0
0 0 0 15094212 248428 1398448 0 0 0 0 191 215
0 0 100 0
1 0 0 15094212 248428 1398448 0 0 0 0 216 238
0 0 100 0
5 0 0 15094212 248428 1398448 0 0 0 6 169 202
0 0 100 0
0 0 0 15094060 248428 1398448 0 0 0 0 172 216
0 0 100 0
2 0 0 15094060 248428 1398448 0 0 0 6 201 196
0 0 100 0
1 0 0 15094060 248428 1398448 0 0 0 0 196 218
0 0 100 0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 15093908 248428 1398448 0 0 0 0 206 236
0 0 100 0
4 0 0 15093908 248428 1398448 0 0 0 0 197 219
0 0 100 0
0 0 0 15093908 248428 1398448 0 0 0 0 186 227
0 0 100 0
0 0 0 15093756 248428 1398448 0 0 0 0 168 182
0 0 100 0
0 0 0 15093756 248428 1398448 0 0 0 0 206 239
0 0 100 0
0 0 0 15093604 248428 1398448 0 0 0 6 281 248
0 0 100 0
0 0 0 15093452 248428 1398448 0 0 0 0 185 198
0 0 100 0
5 0 0 15093452 248428 1398448 0 0 0 0 265 253
0 0 100 0
0 0 0 15093300 248428 1398448 0 0 0 36 194 211
0 0 100 0
0 0 0 15093300 248428 1398448 0 0 0 0 228 242
0 0 100 0
0 0 0 15093300 248428 1398448 0 0 0 0 290 262
0 0 100 0
0 0 0 15093300 248428 1398448 0 0 0 6 187 207
0 0 100 0
On Tue, Aug 17, 2010 at 6:54 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> what does vmstat say? Run it like 'vmstat 2' for a minute or two and
> paste the results.
>
> With no cpu being consumed by java, it seems like there must be
> another hidden variable here. Some zombied process perhaps. Or some
> kind of super IO wait or something else.
>
> Since you are running on a hypervisor environment, i cant really say
> what is happening to your instance, although one would think the LA
> numbers would be unaffected by outside processes.
>
> On Tue, Aug 17, 2010 at 3:49 PM, George P. Stathis <gstathis@traackr.com> wrote:
>> Actually, there is nothing in %wa but a ton sitting in %id. This is
>> from the Master:
>>
>> top - 18:30:24 up 5 days, 20:10, 1 user, load average: 2.55, 1.99, 1.25
>> Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie
>> Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.2%st
>> Mem: 17920228k total, 2795464k used, 15124764k free, 248428k buffers
>> Swap: 0k total, 0k used, 0k free, 1398388k cached
>>
>> I have atop installed which is reporting the hadoop/hbase java daemons
>> as the most active processes (barely taking any CPU time though and
>> most of the time in sleep mode):
>>
>> ATOP - domU-12-31-39-18-1 2010/08/17 18:31:46 10 seconds elapsed
>> PRC | sys 0.01s | user 0.00s | #proc 89 | #zombie 0 | #exit
0 |
>> CPU | sys 0% | user 0% | irq 0% | idle 200% | wait
0% |
>> cpu | sys 0% | user 0% | irq 0% | idle 100% | cpu000
w 0% |
>> CPL | avg1 2.55 | avg5 2.12 | avg15 1.35 | csw 2397 | intr 2034
|
>> MEM | tot 17.1G | free 14.4G | cache 1.3G | buff 242.6M | slab 193.1M
|
>> SWP | tot 0.0M | free 0.0M | | vmcom 1.6G | vmlim
8.5G |
>> NET | transport | tcpi 330 | tcpo 169 | udpi 566 | udpo
147 |
>> NET | network | ipi 896 | ipo 316 | ipfrw 0 | deliv
896 |
>> NET | eth0 ---- | pcki 777 | pcko 197 | si 248 Kbps | so 70 Kbps
|
>> NET | lo ---- | pcki 119 | pcko 119 | si 9 Kbps | so 9
Kbps |
>>
>> PID CPU COMMAND-LINE
1/1
>> 17613 0% atop
>> 17150 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
>> 16527 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>> 16839 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>> 16735 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server -Dcom.sun.managem
>> 17083 0% /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -XX:+HeapDumpOnOutOfMemor
>>
>> Same with atop:
>>
>> PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
>> 16527 ubuntu 20 0 2352M 98M 10336 S 0.0 0.6 0:42.05
>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
>> -Dhadoop.log.dir=/var/log/h
>> 16735 ubuntu 20 0 2403M 81544 10236 S 0.0 0.5 0:01.56
>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m -server
>> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote
>> -Dhadoop.log.dir=/var/log/h
>> 17083 ubuntu 20 0 4557M 45388 10912 S 0.0 0.3 0:00.65
>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
>> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode -server -XX:+Heap
>> 1 root 20 0 23684 1880 1272 S 0.0 0.0 0:00.23 /sbin/init
>> 587 root 20 0 247M 4088 2432 S 0.0 0.0 -596523h-14:-8
>> /usr/sbin/console-kit-daemon --no-daemon
>> 3336 root 20 0 49256 1092 540 S 0.0 0.0 0:00.36 /usr/sbin/sshd
>> 16430 nobody 20 0 34408 3704 1060 S 0.0 0.0 0:00.01 gmond
>> 17150 ubuntu 20 0 2519M 112M 11312 S 0.0 0.6 -596523h-14:-8
>> /usr/lib/jvm/java-6-sun/bin/java -Xmx2048m
>> -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode -server -XX
>>
>>
>> So I'm a bit perplexed. Are there any hadoop / hbase specific tricks
>> that can reveal what these processes are doing?
>>
>> -GS
>>
>>
>>
>> On Tue, Aug 17, 2010 at 6:14 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>>>
>>> It's not normal, but then again I don't have access to your machines
>>> so I can only speculate.
>>>
>>> Does "top" show you which process is in %wa? If so and it's a java
>>> process, can you figure what's going on in there?
>>>
>>> J-D
>>>
>>> On Tue, Aug 17, 2010 at 11:03 AM, George Stathis <gstathis@gmail.com> wrote:
>>> > Hello,
>>> >
>>> > We have just setup a new cluster on EC2 using Hadoop 0.20.2 and HBase
>>> > 0.20.3. Our small setup as of right now consists of one master and four
>>> > slaves with a replication factor of 2:
>>> >
>>> > Master: xLarge instance with 2 CPUs and 17.5 GB RAM - runs 1 namenode, 1
>>> > secondarynamenode, 1 jobtracker, 1 hbasemaster, 1 zookeeper (uses its' own
>>> > dedicated EMS drive)
>>> > Slaves: xLarge instance with 2 CPUs and 17.5 GB RAM each - run 1 datanode,
1
>>> > tasktracker, 1 regionserver
>>> >
>>> > We have also installed Ganglia to monitor the cluster stats as we are about
>>> > to run some performance tests but, right out of the box, we are noticing
>>> > high system loads (especially on the master node) without any activity
>>> > happening on the clister. Of course, the CPUs are not being utilized at
all,
>>> > but Ganglia is reporting almost all nodes in the red as the 1, 5 an 15
>>> > minute load times are all above 100% most of the time (i.e. there are more
>>> > than two processes at a time competing for the 2 CPUs time).
>>> >
>>> > Question1: is this normal?
>>> > Question2: does it matter since each process barely uses any of the CPU
>>> > time?
>>> >
>>> > Thank you in advance and pardon the noob questions.
>>> >
>>> > -GS
>>> >
>>
>
|