kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: spikes in producer requests/sec
Date Tue, 11 Nov 2014 23:26:34 GMT
There are some fixes in 0.8.2-beta for periodic latency spikes if you are
using acks=-1 in the producer.

-Jay

On Tue, Nov 11, 2014 at 10:50 AM, Wes Chow <wes@chartbeat.com> wrote:

>
> We're seeing periodic spikes in req/sec rates across our nodes. Our
> cluster is 10 nodes, and the topic has a replication factor of 3. We push
> around 200k messages / sec into Kafka.
>
>
> The machines are running the most recent version of Kafka and we're
> connecting via librdkafka. pingstream02-10 are using the CMS garbage
> collector, but I switched pingstream01 to use G1GC under the theory that
> maybe these were GC pauses. The graph shows that likely didn't improve the
> situation.
>
> My next thought is that maybe this is the effect of log rolling. Checking
> in the logs, I see a lot of this:
>
> [2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-342' in 3 ms.
> [2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-186' in 2 ms.
> [2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-253' in 3 ms.
> [2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-209' in 3 ms.
> [2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-299' in 2 ms.
> [2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO
> kafka.log.Log  - Rolled new log segment for 'pings-506' in 2 ms.
>
> The "pings" topic in question has 512 partitions, so it does this 512
> times every so often. We have an effective retention period of a bit less
> than 30 min, so rolling happens pretty frequently. Still, if I assume worst
> case that rolling locks up the process for 2ms and there are 512 rolls
> every few minutes, I'd expect halting to happen for about a second at a
> time. The graphs seem to indicate much longer dips, but it's hard for me to
> know if I'm looking at real data or some sort of artifact.
>
> Fwiw, the producers are not reporting any errors, so it does not seem like
> we're losing data.
>
> I'm new to Kafka. Should I be worried? If so, how should I be debugging
> this?
>
> Thanks,
> Wes
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message