kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes Chow <...@chartbeat.com>
Subject spikes in producer requests/sec
Date Tue, 11 Nov 2014 18:50:00 GMT

We're seeing periodic spikes in req/sec rates across our nodes. Our 
cluster is 10 nodes, and the topic has a replication factor of 3. We 
push around 200k messages / sec into Kafka.


The machines are running the most recent version of Kafka and we're 
connecting via librdkafka. pingstream02-10 are using the CMS garbage 
collector, but I switched pingstream01 to use G1GC under the theory that 
maybe these were GC pauses. The graph shows that likely didn't improve 
the situation.

My next thought is that maybe this is the effect of log rolling. 
Checking in the logs, I see a lot of this:

[2014-11-11 13:46:45,836] 72952071 [ReplicaFetcherThread-0-7] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-342' in 3 ms.
[2014-11-11 13:46:47,116] 72953351 [kafka-request-handler-0] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-186' in 2 ms.
[2014-11-11 13:46:48,155] 72954390 [ReplicaFetcherThread-0-8] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-253' in 3 ms.
[2014-11-11 13:46:48,408] 72954643 [ReplicaFetcherThread-0-4] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-209' in 3 ms.
[2014-11-11 13:46:48,436] 72954671 [ReplicaFetcherThread-0-4] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-299' in 2 ms.
[2014-11-11 13:46:48,687] 72954922 [kafka-request-handler-0] INFO  
kafka.log.Log  - Rolled new log segment for 'pings-506' in 2 ms.

The "pings" topic in question has 512 partitions, so it does this 512 
times every so often. We have an effective retention period of a bit 
less than 30 min, so rolling happens pretty frequently. Still, if I 
assume worst case that rolling locks up the process for 2ms and there 
are 512 rolls every few minutes, I'd expect halting to happen for about 
a second at a time. The graphs seem to indicate much longer dips, but 
it's hard for me to know if I'm looking at real data or some sort of 
artifact.

Fwiw, the producers are not reporting any errors, so it does not seem 
like we're losing data.

I'm new to Kafka. Should I be worried? If so, how should I be debugging 
this?

Thanks,
Wes


Mime
View raw message