logging-log4j-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Sicker <boa...@gmail.com>
Subject Re: log4j2 2.0-rc1 issues on AIX
Date Thu, 20 Mar 2014 13:10:45 GMT
Perhaps lmax disruptor doesn't work properly in the IBM JVM?

On Tuesday, 18 March 2014, Chris Graham <chrisgwarp@gmail.com> wrote:

> JStack is a Sun thing. This is the IBM JDK on AIX.
> I've run the tprof command twice and verified it.
>
> The full work though follows.
>
> The output from topas (same as top, effectively) is:
>
> Topas Monitor for host:    XXXXXXXXXXXXXXX      EVENTS/QUEUES    FILE/TTY
> Wed Mar 19 14:49:55 2014   Interval:  2         Cswitch    3581
> Readch      686
>                                                 Syscall    2763  Writech
> 1378
> CPU  User%  Kern%  Wait%  Idle%  Physc   Entc   Reads         7
> Rawin         0
> ALL   48.8    1.2    0.0   50.1   1.03   51.7   Writes        5
> Ttyout      643
>                                                 Forks         0
> Igets         0
> Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out  Execs         0
> Namei        81
> Total     2.8     10.0     7.5     1.1     1.7  Runqueue    1.0
> Dirblk        0
>                                                 Waitqueue   0.0
> Disk    Busy%     KBPS     TPS KB-Read KB-Writ                   MEMORY
> Total     0.0      0.0     0.0     0.0     0.0  PAGING           Real,MB
> 20480
>                                                 Faults        3  % Comp
> 44
> FileSystem        KBPS     TPS KB-Read KB-Writ  Steals        0  % Noncomp
> 54
> Total              0.6     1.5    0.6    0.0    PgspIn        0  % Client
> 54
>                                                 PgspOut       0
> Name            PID  CPU%  PgSp Owner           PageIn        0  PAGING
> SPACE
> java        9437312  48.6 739.1 wasadmin        PageOut       0  Size,MB
> 15552
>
> Process with a PID of 9437312 is the WebSphere instance that runs Archiva,
> the problem in question.
>
> We then use a native AIX tool, tprof to examine that process, and see what
> it's doing:
>
> tprof -j -P 9437312 -skex sleep 60
>
> This generates sleep.prof, and the relevant section is:
>
> Configuration information
> =========================
> System: AIX 6.1 Node: au02qap207teax2 Machine: 00C43D204C00
> Tprof command was:
>     tprof -j -P 9437312 -skex sleep 60
> Trace command was:
>     /usr/bin/trace -ad -M -L 2438933299 -T 500000 -j
> 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,234,5D8, -o -
> Total Samples = 24008
> Traced Time = 60.02s (out of a total execution time of 60.02s)
>
> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>
> Process                          Freq  Total Kernel   User Shared  Other
> Java
> =======                          ====  ===== ======   ==== ======  =====
> ====
> wait                                4  64.29  64.29   0.00   0.00   0.00
> 0.00
> java                              182  34.66   1.97   0.00  32.65   0.05
> 0.00
> /usr/sbin/syncd                     2   0.17   0.17   0.00   0.00   0.00
> 0.00
> sshd:                               1   0.14   0.09   0.05   0.01   0.00
> 0.00
> /usr/bin/ps                        13   0.12   0.10   0.00   0.03   0.00
> 0.00
> /usr/bin/sh                        27   0.12   0.11   0.00   0.00   0.00
> 0.00
> httpd                              12   0.09   0.05   0.01   0.03   0.00
> 0.00
> /usr/bin/scp                        1   0.09   0.07   0.00   0.01   0.00
> 0.00
> /opt/freeware/bin/readlink          6   0.04   0.04   0.00   0.00   0.00
> 0.00
> /usr/bin/tprof                      1   0.03   0.00   0.01   0.02   0.00
> 0.00
> kulagent                            7   0.03   0.02   0.00   0.00   0.00
> 0.00
> swapper                             1   0.02   0.02   0.00   0.00   0.00
> 0.00
> /usr/bin/grep                       5   0.02   0.02   0.00   0.00   0.00
> 0.00
> /usr/sbin/getty                     1   0.02   0.02   0.00   0.00   0.00
> 0.00
> kuxagent                            3   0.02   0.01   0.00   0.00   0.00
> 0.00
> lrud                                1   0.02   0.02   0.00   0.00   0.00
> 0.00
> srmAIX                              3   0.01   0.01   0.00   0.00   0.00
> 0.00
> /opt/tivoli/cit/bin/wscanuse        1   0.01   0.01   0.00   0.00   0.00
> 0.00
> psmd                                1   0.01   0.01   0.00   0.00   0.00
> 0.00
> xmgc                                1   0.01   0.01   0.00   0.00   0.00
> 0.00
> nfs_stat                            2   0.01   0.01   0.00   0.00   0.00
> 0.00
> /usr/bin/topasrec                   2   0.01   0.01   0.00   0.00   0.00
> 0.00
> /usr/bin/xmtopas                    1   0.01   0.00   0.00   0.00   0.00
> 0.00
> kcawd                               1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/sbin/snmpd                     1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/sbin/snmpmibd                  1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/bin/date                       1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/bin/entstat                    1   0.00   0.00   0.00   0.00   0.00
> 0.00
> gil                                 1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/bin/vmstat                     1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/bin/ln                         1   0.00   0.00   0.00   0.00   0.00
> 0.00
> stat_daemon                         1   0.00   0.00   0.00   0.00   0.00
> 0.00
> j2pg                                1   0.00   0.00   0.00   0.00   0.00
> 0.00
> /usr/sbin/netstat                   1   0.00   0.00   0.00   0.00   0.00
> 0.00
> =======                          ====  ===== ======   ==== ======  =====
> ====
> Total                             289 100.00  67.11   0.07  32.77   0.05
> 0.00
>
> Process               PID      TID  Total Kernel   User Shared  Other
> Java
> =======               ===      ===  ===== ======   ==== ======  =====
> ====
> java              9437312 46072057  23.76   0.05   0.00  23.70   0.00
> 0.00
>
> Which gives me the needed TID, 46072057 = 0x2BF00F9.
>
> So to trigger a non-fatal heap dump:
>
> kill -3 9437312
>
> and then we look into the created javacore.20140318.173250.9437312.0002.txt
> file for references to 0x2BF00F9:
>
> 3XMTHREADINFO      "AsyncLoggerConfig-1" J9VMThread:0x0000000031D14600,
> j9thread_t:0x00000100137D8BD0, java/lang/Thread:0x000000004301C508,
> state:CW, prio=5
> 3XMJAVALTHREAD            (java/lang/Thread getId:0x6A, isDaemon:true)
> 3XMTHREADINFO1            (native thread ID:0x2BF00F9, native priority:0x5,
> native policy:UNKNOWN)
> 3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0 (0x0)
> 3XMTHREADINFO3           Java callstack:
> 4XESTACKTRACE                at sun/misc/Unsafe.park(Native Method)
> 4XESTACKTRACE                at
> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
> 4XESTACKTRACE                at
>
> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
> 4XESTACKTRACE                at
>
> com/lmax/disruptor/SleepingWaitStrategy.waitFor(SleepingWaitStrategy.java:39)
> 4XESTACKTRACE                at
>
> com/lmax/disruptor/ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55)
> 4XESTACKTRACE                at
> com/lmax/disruptor/BatchEventProcessor.run(BatchEventProcessor.java:115)
> 4XESTACKTRACE                at
>
> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 4XESTACKTRACE                at
>
> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 4XESTACKTRACE                at java/lang/Thread.run(Thread.java:773)
> 3XMTHREADINFO3           Native callstack:
> 4XENATIVESTACK               _event_wait+0x2b8 (0x09000000007E7D3C
> [libpthreads.a+0x16d3c])
> 4XENATIVESTACK               _cond_wait_local+0x4e4 (0x09000000007F5A48
> [libpthreads.a+0x24a48])
> 4XENATIVESTACK               _cond_wait+0xbc (0x09000000007F6020
> [libpthreads.a+0x25020])
> 4XENATIVESTACK               pthread_cond_wait+0x1a8 (0x09000000007F6C8C
> [libpthreads.a+0x25c8c])
> 4XENATIVESTACK               (0x0900000001223014 [libj9thr26.so+0x6014])
> 4XENATIVESTACK               (0x0900000001222C60 [libj9thr26.so+0x5c60])
> 4XENATIVESTACK               (0x090000000116AE58 [libj9vm26.so+0xfe58])
> 4XENATIVESTACK               (0x090000000116B17C [libj9vm26.so+0x1017c])
> 4XENATIVESTACK               (0x0900000001810528
> [libjclscar_26.so+0x5c528])
> 4XENATIVESTACK               (0x0900000001813B98
> [libjclscar_26.so+0x5fb98])
> 4XENATIVESTACK               (0x0900000001161764 [libj9vm26.so+0x6764])
> 4XENATIVESTACK               (0x0900000001239CA0 [libj9prt26.so+0x2ca0])
> 4XENATIVESTACK               (0x09000000011615D4 [libj9vm26.so+0x65d4])
> 4XENATIVESTACK               (0x090000000121FAF4 [libj9thr26.so+0x2af4])
> 4XENATIVESTACK               _pthread_body+0xf0 (0x09000000007D4D34
> [libpthreads.a+0x3d34])
> NULL
>
> Would you like me to attach the complete sleep.tprof and javacore.txt file
> to the Jira ticket that I just created?
>
> https://issues.apache.org/jira/browse/LOG4J2-571
>
> Thanks for looking, this has been driving me nuts.
>
> -Chris
>
>
>
> On Wed, Mar 19, 2014 at 2:45 PM, Remko Popma <remko.popma@gmail.com>
> wrote:
>
> > Just to double-check: when using blocking wait, and running jstack, does
> > BlockingWaitStrategy appear in the stack trace?
> > Also, it it possible to double-check (perhaps attach VisualVM) that it
> > definitely is the AsyncLoggerConfig-1 thread that consumes so much CPU?
> >
> >
> > Sent from my iPhone
> >
> > > On 2014/03/19, at 12:31, Chris Graham <chrisgwarp@gmail.com> wrote:
> > >
> > > I have tried both Block and Sleep (the default), but not Yield. No
> > > discernable difference.
> > >
> > >
> > >> On Wed, Mar 19, 2014 at 1:53 PM, Remko Popma <remko.popma@gmail.com>
> > wrote:
> > >>
> > >> As tweeted, I suggest trying the blocking wait strategy. Can you run a
> > >> jstack dump (and perhaps attach result to a Jira ticket)? In the
> > attached
> > >> stack trace below, the AsyncLoggerConfig-1 thread seems to be parked,
> > >> waiting for a new log event... Doesn't explain high CPU usage...
> > >>
> > >> Sent from my iPhone
> > >>
> > >>> On 2014/03/19, at 10:27, Chris Graham <chrisgwarp@gmail.com>
wrote:
> > >>>
> > >>> Hello Everyone.
> > >>>
> > >>> In this instance, I'm in indirect used of log4j2 2.0-rc1, as it's in
> > the
> > >>> web app that I'm using, Apache Archiva 2.0.1.
> > >>>
> > >>> The issue is that when running under WebSphere 8.5.0.2 (obviously on
> > the
> > >>> IBM JDK, 1.6) on AIX 6.1 TL8, Apache Archiva when it's doing nothing,
> > is
> > >>> sitting idle on around 50% CPU.
> > >>>
> > >>> Obviosuly, this is not good!
> > >>>
> > >>> I've performed the AIX native analysis, to get the native thread ID,
> > >> mapped
> > >>> it to a Java thread it, triggered a heap dump, and I've found this
as
> > the
> > >>> culprit:
> > >>>
> > >>> 3XMTHREADINFO      "AsyncLoggerConfig-1"
> J9VMThread:0x0000000031D14600,
> > >>> j9thread_t:0x00000100137D8BD0, java/lang/Thread:0x000000004301C508,
> > >>> state:CW, prio=5
> > >>> 3XMJAVALTHREAD            (java/lang/Thread getId:0x6A,
> isDaemon:true)
> > >>> 3XMTHREADINFO1            (native thread ID:0x2BF00F9, native
> > >> priority:0x5,
> > >>> native policy:UNKNOWN)
> > >>> 3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0
> > (0x0)
> > >>> 3XMTHREADINFO3           Java callstack:
> > >>> 4XESTACKTRACE                at sun/misc/Unsafe.park(Native Method)
> > >>> 4XESTACKTRACE                at
> > >>>
> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
> > >>> 4XESTACKTRACE                at
> > >>
> >
> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
> > >>> 4XESTACKTRACE



-- 
Matt Sicker <boards@gmail.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message