logging-log4j-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Graham <chrisgw...@gmail.com>
Subject Re: log4j2 2.0-rc1 issues on AIX
Date Fri, 21 Mar 2014 02:46:48 GMT
+1 to the default of Block!

1ns is too small. No wonder is sucked CPU. :-)

Thanks for looking!

-Chris

Sent from my iPhone

On 21/03/2014, at 1:32 PM, Remko Popma <remko.popma@gmail.com> wrote:

> Took another look at the Disruptor SleepingWait strategy. It actually uses
> a back-off strategy. From the javadoc:
> 
> Sleeping strategy that initially spins, then uses a Thread.yield(), and
> eventually for the minimum number of nanos the OS and JVM will allow while
> the {@link com.lmax.disruptor.EventProcessor}s are waiting on a barrier.
> This strategy is a good compromise between performance and CPU resource.
> Latency spikes can occur after quiet periods.
> 
> The Disruptor SleepingWait strategy code eventually calls LockSupport.
> parkNanos(1L);
> Different platforms have different timer resolution (I think Windows cannot
> go smaller than ~15 millis), and it is possible that AIX has a more
> accurate clock.
> 
> I'm beginning to think perhaps BlockingWait should be the default for log4j.
> 
> On Fri, Mar 21, 2014 at 9:33 AM, Chris Graham <chrisgwarp@gmail.com> wrote:
> 
>> The AIX system clock is not the same base time as most Intel boxes. What is
>> the sleep time in the sleep strategy? If it's being derived, it might be
>> too small. ???
>> 
>> Just a thought.
>> 
>> To further complicate matters, this particular lpar was uncapped, which
>> means that it can steal CPU from other lpars that are not as busy. So the
>> number of active CPU's can dynamically vary.
>> 
>> -Chris
>> 
>> 
>> 
>> On Fri, Mar 21, 2014 at 9:01 AM, Remko Popma <remko.popma@gmail.com>
>> wrote:
>> 
>>> No, it turned out that the docs for Apache Archiva were incorrect and the
>>> WaitStrategy was effectively still SleepingWaitStrategy. Using the
>> correct
>>> config to specify BlockingWaitStrategy solved the issue. (LOG4J2-571)
>>> 
>>> FYI, the wait strategy determines what the consumer thread does when the
>>> queue is empty & it's waiting for events. Some specialized apps want to
>>> avoid the latency of waking up a blocked thread, so there are a number of
>>> options with different trade-offs, with busy spin on one end of the
>>> spectrum and blocking on the other. For log4j I reduced the set of
>>> available options (no busy spin), and choose  SleepingWait as the
>> default.
>>> This had the best performance in my testing. Until now I hadn't seen any
>>> excessive CPU usage.
>>> 
>>> Sent from my iPhone
>>> 
>>>> On 2014/03/20, at 22:10, Matt Sicker <boards@gmail.com> wrote:
>>>> 
>>>> Perhaps lmax disruptor doesn't work properly in the IBM JVM?
>>>> 
>>>>> On Tuesday, 18 March 2014, Chris Graham <chrisgwarp@gmail.com>
wrote:
>>>>> 
>>>>> JStack is a Sun thing. This is the IBM JDK on AIX.
>>>>> I've run the tprof command twice and verified it.
>>>>> 
>>>>> The full work though follows.
>>>>> 
>>>>> The output from topas (same as top, effectively) is:
>>>>> 
>>>>> Topas Monitor for host:    XXXXXXXXXXXXXXX      EVENTS/QUEUES
>>> FILE/TTY
>>>>> Wed Mar 19 14:49:55 2014   Interval:  2         Cswitch    3581
>>>>> Readch      686
>>>>>                                               Syscall    2763
>> Writech
>>>>> 1378
>>>>> CPU  User%  Kern%  Wait%  Idle%  Physc   Entc   Reads         7
>>>>> Rawin         0
>>>>> ALL   48.8    1.2    0.0   50.1   1.03   51.7   Writes        5
>>>>> Ttyout      643
>>>>>                                               Forks         0
>>>>> Igets         0
>>>>> Network  KBPS   I-Pack  O-Pack   KB-In  KB-Out  Execs         0
>>>>> Namei        81
>>>>> Total     2.8     10.0     7.5     1.1     1.7  Runqueue    1.0
>>>>> Dirblk        0
>>>>>                                               Waitqueue   0.0
>>>>> Disk    Busy%     KBPS     TPS KB-Read KB-Writ
>> MEMORY
>>>>> Total     0.0      0.0     0.0     0.0     0.0  PAGING
>> Real,MB
>>>>> 20480
>>>>>                                               Faults        3  % Comp
>>>>> 44
>>>>> FileSystem        KBPS     TPS KB-Read KB-Writ  Steals        0  %
>>> Noncomp
>>>>> 54
>>>>> Total              0.6     1.5    0.6    0.0    PgspIn        0  %
>>> Client
>>>>> 54
>>>>>                                               PgspOut       0
>>>>> Name            PID  CPU%  PgSp Owner           PageIn        0
>> PAGING
>>>>> SPACE
>>>>> java        9437312  48.6 739.1 wasadmin        PageOut       0
>> Size,MB
>>>>> 15552
>>>>> 
>>>>> Process with a PID of 9437312 is the WebSphere instance that runs
>>> Archiva,
>>>>> the problem in question.
>>>>> 
>>>>> We then use a native AIX tool, tprof to examine that process, and see
>>> what
>>>>> it's doing:
>>>>> 
>>>>> tprof -j -P 9437312 -skex sleep 60
>>>>> 
>>>>> This generates sleep.prof, and the relevant section is:
>>>>> 
>>>>> Configuration information
>>>>> =========================
>>>>> System: AIX 6.1 Node: au02qap207teax2 Machine: 00C43D204C00
>>>>> Tprof command was:
>>>>>   tprof -j -P 9437312 -skex sleep 60
>>>>> Trace command was:
>>>>>   /usr/bin/trace -ad -M -L 2438933299 -T 500000 -j
>>>>> 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,234,5D8, -o -
>>>>> Total Samples = 24008
>>>>> Traced Time = 60.02s (out of a total execution time of 60.02s)
>>>>> 
>>>>> 
>>> 
>> <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>> 
>>>>> Process                          Freq  Total Kernel   User Shared
>> Other
>>>>> Java
>>>>> =======                          ====  ===== ======   ==== ======
>> =====
>>>>> ====
>>>>> wait                                4  64.29  64.29   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> java                              182  34.66   1.97   0.00  32.65
>> 0.05
>>>>> 0.00
>>>>> /usr/sbin/syncd                     2   0.17   0.17   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> sshd:                               1   0.14   0.09   0.05   0.01
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/ps                        13   0.12   0.10   0.00   0.03
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/sh                        27   0.12   0.11   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> httpd                              12   0.09   0.05   0.01   0.03
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/scp                        1   0.09   0.07   0.00   0.01
>> 0.00
>>>>> 0.00
>>>>> /opt/freeware/bin/readlink          6   0.04   0.04   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/tprof                      1   0.03   0.00   0.01   0.02
>> 0.00
>>>>> 0.00
>>>>> kulagent                            7   0.03   0.02   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> swapper                             1   0.02   0.02   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/grep                       5   0.02   0.02   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/sbin/getty                     1   0.02   0.02   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> kuxagent                            3   0.02   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> lrud                                1   0.02   0.02   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> srmAIX                              3   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /opt/tivoli/cit/bin/wscanuse        1   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> psmd                                1   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> xmgc                                1   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> nfs_stat                            2   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/topasrec                   2   0.01   0.01   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/xmtopas                    1   0.01   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> kcawd                               1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/sbin/snmpd                     1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/sbin/snmpmibd                  1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/date                       1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/entstat                    1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> gil                                 1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/vmstat                     1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/bin/ln                         1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> stat_daemon                         1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> j2pg                                1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> /usr/sbin/netstat                   1   0.00   0.00   0.00   0.00
>> 0.00
>>>>> 0.00
>>>>> =======                          ====  ===== ======   ==== ======
>> =====
>>>>> ====
>>>>> Total                             289 100.00  67.11   0.07  32.77
>> 0.05
>>>>> 0.00
>>>>> 
>>>>> Process               PID      TID  Total Kernel   User Shared  Other
>>>>> Java
>>>>> =======               ===      ===  ===== ======   ==== ======  =====
>>>>> ====
>>>>> java              9437312 46072057  23.76   0.05   0.00  23.70   0.00
>>>>> 0.00
>>>>> 
>>>>> Which gives me the needed TID, 46072057 = 0x2BF00F9.
>>>>> 
>>>>> So to trigger a non-fatal heap dump:
>>>>> 
>>>>> kill -3 9437312
>>>>> 
>>>>> and then we look into the created
>>> javacore.20140318.173250.9437312.0002.txt
>>>>> file for references to 0x2BF00F9:
>>>>> 
>>>>> 3XMTHREADINFO      "AsyncLoggerConfig-1"
>> J9VMThread:0x0000000031D14600,
>>>>> j9thread_t:0x00000100137D8BD0, java/lang/Thread:0x000000004301C508,
>>>>> state:CW, prio=5
>>>>> 3XMJAVALTHREAD            (java/lang/Thread getId:0x6A, isDaemon:true)
>>>>> 3XMTHREADINFO1            (native thread ID:0x2BF00F9, native
>>> priority:0x5,
>>>>> native policy:UNKNOWN)
>>>>> 3XMHEAPALLOC             Heap bytes allocated since last GC cycle=0
>>> (0x0)
>>>>> 3XMTHREADINFO3           Java callstack:
>>>>> 4XESTACKTRACE                at sun/misc/Unsafe.park(Native Method)
>>>>> 4XESTACKTRACE                at
>>>>> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
>>>>> 4XESTACKTRACE                at
>>>>> 
>>>>> 
>>> 
>> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
>>>>> 4XESTACKTRACE                at
>>>>> 
>>>>> 
>>> 
>> com/lmax/disruptor/SleepingWaitStrategy.waitFor(SleepingWaitStrategy.java:39)
>>>>> 4XESTACKTRACE                at
>>>>> 
>>>>> 
>>> 
>> com/lmax/disruptor/ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55)
>>>>> 4XESTACKTRACE                at
>>>>> 
>> com/lmax/disruptor/BatchEventProcessor.run(BatchEventProcessor.java:115)
>>>>> 4XESTACKTRACE                at
>>>>> 
>>>>> 
>>> 
>> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>>> 4XESTACKTRACE                at
>>>>> 
>>>>> 
>>> 
>> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>>> 4XESTACKTRACE                at java/lang/Thread.run(Thread.java:773)
>>>>> 3XMTHREADINFO3           Native callstack:
>>>>> 4XENATIVESTACK               _event_wait+0x2b8 (0x09000000007E7D3C
>>>>> [libpthreads.a+0x16d3c])
>>>>> 4XENATIVESTACK               _cond_wait_local+0x4e4
>> (0x09000000007F5A48
>>>>> [libpthreads.a+0x24a48])
>>>>> 4XENATIVESTACK               _cond_wait+0xbc (0x09000000007F6020
>>>>> [libpthreads.a+0x25020])
>>>>> 4XENATIVESTACK               pthread_cond_wait+0x1a8
>> (0x09000000007F6C8C
>>>>> [libpthreads.a+0x25c8c])
>>>>> 4XENATIVESTACK               (0x0900000001223014
>> [libj9thr26.so+0x6014])
>>>>> 4XENATIVESTACK               (0x0900000001222C60
>> [libj9thr26.so+0x5c60])
>>>>> 4XENATIVESTACK               (0x090000000116AE58
>> [libj9vm26.so+0xfe58])
>>>>> 4XENATIVESTACK               (0x090000000116B17C
>> [libj9vm26.so+0x1017c])
>>>>> 4XENATIVESTACK               (0x0900000001810528
>>>>> [libjclscar_26.so+0x5c528])
>>>>> 4XENATIVESTACK               (0x0900000001813B98
>>>>> [libjclscar_26.so+0x5fb98])
>>>>> 4XENATIVESTACK               (0x0900000001161764
>> [libj9vm26.so+0x6764])
>>>>> 4XENATIVESTACK               (0x0900000001239CA0
>> [libj9prt26.so+0x2ca0])
>>>>> 4XENATIVESTACK               (0x09000000011615D4
>> [libj9vm26.so+0x65d4])
>>>>> 4XENATIVESTACK               (0x090000000121FAF4
>> [libj9thr26.so+0x2af4])
>>>>> 4XENATIVESTACK               _pthread_body+0xf0 (0x09000000007D4D34
>>>>> [libpthreads.a+0x3d34])
>>>>> NULL
>>>>> 
>>>>> Would you like me to attach the complete sleep.tprof and javacore.txt
>>> file
>>>>> to the Jira ticket that I just created?
>>>>> 
>>>>> https://issues.apache.org/jira/browse/LOG4J2-571
>>>>> 
>>>>> Thanks for looking, this has been driving me nuts.
>>>>> 
>>>>> -Chris
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Mar 19, 2014 at 2:45 PM, Remko Popma <remko.popma@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Just to double-check: when using blocking wait, and running jstack,
>>> does
>>>>>> BlockingWaitStrategy appear in the stack trace?
>>>>>> Also, it it possible to double-check (perhaps attach VisualVM) that
>> it
>>>>>> definitely is the AsyncLoggerConfig-1 thread that consumes so much
>> CPU?
>>>>>> 
>>>>>> 
>>>>>> Sent from my iPhone
>>>>>> 
>>>>>>> On 2014/03/19, at 12:31, Chris Graham <chrisgwarp@gmail.com>
wrote:
>>>>>>> 
>>>>>>> I have tried both Block and Sleep (the default), but not Yield.
No
>>>>>>> discernable difference.
>>>>>>> 
>>>>>>> 
>>>>>>>> On Wed, Mar 19, 2014 at 1:53 PM, Remko Popma <
>> remko.popma@gmail.com>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>> As tweeted, I suggest trying the blocking wait strategy.
Can you
>> run
>>> a
>>>>>>>> jstack dump (and perhaps attach result to a Jira ticket)?
In the
>>>>>> attached
>>>>>>>> stack trace below, the AsyncLoggerConfig-1 thread seems to
be
>> parked,
>>>>>>>> waiting for a new log event... Doesn't explain high CPU usage...
>>>>>>>> 
>>>>>>>> Sent from my iPhone
>>>>>>>> 
>>>>>>>>> On 2014/03/19, at 10:27, Chris Graham <chrisgwarp@gmail.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello Everyone.
>>>>>>>>> 
>>>>>>>>> In this instance, I'm in indirect used of log4j2 2.0-rc1,
as it's
>> in
>>>>>> the
>>>>>>>>> web app that I'm using, Apache Archiva 2.0.1.
>>>>>>>>> 
>>>>>>>>> The issue is that when running under WebSphere 8.5.0.2
(obviously
>> on
>>>>>> the
>>>>>>>>> IBM JDK, 1.6) on AIX 6.1 TL8, Apache Archiva when it's
doing
>>> nothing,
>>>>>> is
>>>>>>>>> sitting idle on around 50% CPU.
>>>>>>>>> 
>>>>>>>>> Obviosuly, this is not good!
>>>>>>>>> 
>>>>>>>>> I've performed the AIX native analysis, to get the native
thread
>> ID,
>>>>>>>> mapped
>>>>>>>>> it to a Java thread it, triggered a heap dump, and I've
found this
>>> as
>>>>>> the
>>>>>>>>> culprit:
>>>>>>>>> 
>>>>>>>>> 3XMTHREADINFO      "AsyncLoggerConfig-1"
>>>>> J9VMThread:0x0000000031D14600,
>>>>>>>>> j9thread_t:0x00000100137D8BD0,
>> java/lang/Thread:0x000000004301C508,
>>>>>>>>> state:CW, prio=5
>>>>>>>>> 3XMJAVALTHREAD            (java/lang/Thread getId:0x6A,
>>>>> isDaemon:true)
>>>>>>>>> 3XMTHREADINFO1            (native thread ID:0x2BF00F9,
native
>>>>>>>> priority:0x5,
>>>>>>>>> native policy:UNKNOWN)
>>>>>>>>> 3XMHEAPALLOC             Heap bytes allocated since last
GC
>> cycle=0
>>>>>> (0x0)
>>>>>>>>> 3XMTHREADINFO3           Java callstack:
>>>>>>>>> 4XESTACKTRACE                at sun/misc/Unsafe.park(Native
>> Method)
>>>>>>>>> 4XESTACKTRACE                at
>>>>> java/util/concurrent/locks/LockSupport.parkNanos(LockSupport.java:332)
>>>>>>>>> 4XESTACKTRACE                at
>>>>> 
>>> 
>> com/lmax/disruptor/SleepingWaitStrategy.applyWaitMethod(SleepingWaitStrategy.java:66)
>>>>>>>>> 4XESTACKTRACE
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Matt Sicker <boards@gmail.com>
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
>>> For additional commands, e-mail: log4j-user-help@logging.apache.org
>>> 
>>> 
>> 

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Mime
View raw message