kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Kafka long tail latency issue
Date Tue, 03 Feb 2015 16:51:23 GMT
If you are on 0.8.1 or higher and are running with replication consider
disabling the forced log flush, that will definitely lead to latency spikes
as the flush is synchronous. You will still get durability from replication
and the background OS flush. On Linux the background I/O flush the OS does
doesn't have much impact.

Also we fixed several significant latency related bugs in 0.8.1 for the
0.8.2 release so consider giving that a try.

Finally Linux write performance is itself highly variable. Even in the
absence of any synchronous flushing there is some locking around I/O
operations like allocating new journal blocks. If you are running linux I
think we include some tuning options in the ops section of the
documentation that help reduce that. There is a test class
kafka.TestLinearWriteSpeed which will benchmark the throughput and latency
either using a plain file or a local Kafka log. It is worth doing this to
get a baseline for how fast and variable things can be in the absence of
any network or coordination.



On Tue, Feb 3, 2015 at 1:37 AM, Xinyi Su <xinyisu@gmail.com> wrote:

> Hi,
> I am building Kafka cluster and run producer perf test to get Kafka latency
> performance.
> From test result, I notice that the long tail latency is very high and
> increased with time passing by although the 99.9% result looks very good.
> The worst latency can reach more than 1 second. Besides, disk utilization
> is always very low, never more than 1%. I also try to tune
> log.flush.interval.ms from 1000ms to 200ms. It does not help much.
> Below is the max latency chart, Y axis represents the max latency in
> millisecond, X axis represents the time elapsed in milliseconds. From
> chart, we can see the latency increasing from about 10ms to 1095ms
> gradually.
> [image: Inline image]
> Kafka cluster is built up with 4 hosts. The version is 2.9.2-0.8.2-beta.
> The PerfTopic15 topic is created with 3 partition and 3 replication.
> Here is my perf script usage:
> -bash-4.1$ bin/kafka-producer-perf-test.sh   --broker-list <broker
> list> --topics *PerfTopic15* --sync --initial-message-id 1 --messages
> 200000 --csv-reporter-enabled --metrics-dir /tmp/PerfTopic15_1
> --message-send-gap-ms 20* --request-num-acks -1* --batch-size 1
> -bash-4.1$ bin/kafka-topics.sh  --zookeeper <zkHost>:2181  --describe
> --topic *PerfTopic15*
> Topic:PerfTopic15 PartitionCount:3 ReplicationFactor:3 Configs:
> Topic: PerfTopic15 Partition: 0 Leader: 3 Replicas: 3,4,1 Isr: 3,4,1
> Topic: PerfTopic15 Partition: 1 Leader: 4 Replicas: 4,1,2 Isr: 4,1,2
> Topic: PerfTopic15 Partition: 2 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
> I expect the worst latency not exceed 100 milliseconds. But the test result
> is very discouraging. Do you have some points about Kafka long tail latency
> issue?
> Hope for your reply! Thanks in advance!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message