storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roee Shenberg <shenb...@alooma.io>
Subject Re: Performance issues on idle trident topology
Date Sun, 17 Sep 2017 07:05:45 GMT
Unfortunately, I haven't found a reliable way to replicate this behavior,
so I've moved on for now. Will update if the behavior recurs.

On Mon, Sep 4, 2017 at 10:46 PM, Roshan Naik <roshan@hortonworks.com> wrote:

> Roee, Have you found a resolution ? If not please bring it up on the dev
> list.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <shenberg@alooma.io>
> *Reply-To: *"user@storm.apache.org" <user@storm.apache.org>
> *Date: *Wednesday, August 30, 2017 at 5:43 AM
>
> *To: *"user@storm.apache.org" <user@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
> Hi Roshan,
>
>
>
> The only unusual setting I'm using is topology.disruptor.batch.
> timeout.millis set to 20 ms instead of the default 1ms, following my
> previous performance issues.
>
>
>
> Roee
>
>
>
>
>
> On Wed, Aug 30, 2017 at 3:50 AM, Roshan Naik <roshan@hortonworks.com>
> wrote:
>
> Is it running with default settings ? If not, may want to share any
> non-default settings being used. Somebody (more familiar with this than
> myself) might be able to  chime in.
>
> -roshan
>
>
>
> *From: *Roee Shenberg <shenberg@alooma.io>
> *Reply-To: *"user@storm.apache.org" <user@storm.apache.org>
> *Date: *Tuesday, August 29, 2017 at 9:03 AM
> *To: *"user@storm.apache.org" <user@storm.apache.org>
> *Subject: *Re: Performance issues on idle trident topology
>
>
>
>
>
> Hi,
>
>
>
> I'm running Storm 1.0.2 + Trident and I have an issue with idle topologies
> consuming many CPU cycles (around four entire cores), after having
> processed some messages.
>
> I'm not quite sure what the issue is, because it requires some activity on
> the topology to trigger said behavior.
>
>
>
> My research on what's causing the issue has reached a dead end since it's
> not easy to replicate the matter (it happens reliably after a few days of
> idle time).
>
>
>
> I managed to make the issue happen on a 1-machine cluster with only one
> topology running on it, and saw the following:
>
>
>
> sudo strace -f -c -p 24996
>
> strace: Process 24996 attached with 242 threads
>
> % time     seconds  usecs/call     calls    errors syscall
>
> ------ ----------- ----------- --------- --------- ----------------
>
>  77.96  228.512766         260    878277           epoll_wait
>
>  17.78   52.106948       76628       680       180 futex
>
>   3.73   10.924000     1365500         8         8 restart_syscall
>
>   0.20    0.584934        2600       225           epoll_ctl
>
>   0.20    0.574205        1704       337           read
>
>   0.14    0.416201        1850       225           write
>
>   0.00    0.000000           0         1           stat
>
>   0.00    0.000000           0         1           mmap
>
>   0.00    0.000000           0         1           rt_sigprocmask
>
>   0.00    0.000000           0         1           rt_sigreturn
>
>   0.00    0.000000           0         1           madvise
>
> ------ ----------- ----------- --------- --------- ----------------
>
> 100.00  293.119054                879757       188 total
>
>
>
> That's an unreasonable amount of epoll_wait calls, and I noticed a large
> amount of ZooKeeper connections: nc -n | grep 2181 | wc -l returned 25
> connections
>
>
>
> The topology statistics were ~2400 tuples emitted and ~3000 tuples
> transferred in the 10 minute window. Basically idle.
>
>
>
> Next, I produced a flamegraph including kernel times while tracing the
> process, which I couldn't attach as a fully-functional, clickable SVG so I
> took a screenshot of it:
>
> [image: line image 1]
>
>
> The flamegraph shows the entire time is spent in
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run
>
>
>
> Split 50/50 between sys_epoll_wait in the kernel and java code, mostly the
> java side of select, with a non negligible amount of time spent in
> org/apache/zookeeper/ClientCnxn$SendThread:::
> clientTunneledAuthenticationInProgress
>
>
>
> This seems like the zookeeper element is malfunctioning somehow, but I'm
> not sure how.
>
>
>
> Do you have any advice regarding knobs to tweak in storm or ZK config or
> additional diagnostics to run? (the relevant ZK config AFAIK is the tick
> interval, which is at the default 2 seconds)
>
>
>
> Thanks,
>
> Roee
>
>
>
>
>

Mime
View raw message