kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuheng Du <yuheng.du.h...@gmail.com>
Subject Re: latency test
Date Fri, 04 Sep 2015 17:55:45 GMT
Thanks for your reply Erik. I am running some more tests according to your
suggestions now and I will share with my results here. Is it necessary to
use a fixed number of partitions (32 partitions maybe) for my test?

I am testing 2, 4, 8, 16 and 32 brokers scenarios, all of them are running
on individual physical nodes. So I think using at least 32 partitions will
make more sense? I have seen latencies increase as the number of partitions
goes up in my experiments.

To get the latency of each event data recorded, are you suggesting that I
rewrite my own test program (in Java perhaps) or I can just modify the
standard test program provided by kafka (
https://gist.github.com/jkreps/c7ddb4041ef62a900e6c )? I guess I need to
rebuild the source if I modify the standard java test program
ProducerPerformance provided in kafka, right? Now this standard program
only has average latencies and percentile latencies but no per event
latencies.

Thanks.

On Fri, Sep 4, 2015 at 1:42 PM, Helleren, Erik <Erik.Helleren@cmegroup.com>
wrote:

> That is an excellent question!  There are a bunch of ways to monitor
> jitter and see when that is happening.  Here are a few:
>
> - You could slice the histogram every few seconds, save it out with a
> timestamp, and then look at how they compare.  This would be mostly
> manual, or you can graph line charts of the percentiles over time in excel
> where each percentile would be a series.  If you are using HDR Histogram,
> you should look at how to use the Recorder class to do this coupled with a
> ScheduledExecutorService.
>
> - You can just save the starting timestamp of the event and the latency of
> each event.  If you put it into a CSV, you can just load it up into excel
> and graph as a XY chart.  That way you can see every point during the
> running of your program and you can see trends.  You want to be careful
> about this one, especially of writing to a file in the callback that kfaka
> provides.
>
> Also, I have noticed that most of the very slow observations are at
> startup.  But don’t trust me, trust the data and share your findings.
> Also, having a 99.9 percentile provides a pretty good standard for typical
> poor case performance.  Average is borderline useless, 50%’ile is a better
> typical case because that’s the number that says “half of events will be
> this slow or faster”, or for values that are high like 99.9%’ile, “0.1% of
> all events will be slower than this”.
> -Erik
>
> On 9/4/15, 12:05 PM, "Yuheng Du" <yuheng.du.hust@gmail.com> wrote:
>
> >Thank you Erik! That's is helpful!
> >
> >But also I see jitters of the maximum latencies when running the
> >experiment.
> >
> >The average end to acknowledgement latency from producer to broker is
> >around 5ms when using 92 producers and 4 brokers, and the 99.9 percentile
> >latency is 58ms, but the maximum latency goes up to 1359 ms. How to locate
> >the source of this jitter?
> >
> >Thanks.
> >
> >On Fri, Sep 4, 2015 at 10:54 AM, Helleren, Erik
> ><Erik.Helleren@cmegroup.com>
> >wrote:
> >
> >> WellŠ not to be contrarian, but latency depends much more on the latency
> >> between the producer and the broker that is the leader for the partition
> >> you are publishing to.  At least when your brokers are not saturated
> >>with
> >> messages, and acks are set to 1.  If acks are set to ALL, latency on an
> >> non-saturated kafka cluster will be: Round Trip Latency from producer to
> >> leader for partition + Max( slowest Round Trip Latency to a replicas of
> >> that partition).  If a cluster is saturated with messages, we have to
> >> assume that all partitions receive an equal distribution of messages to
> >> avoid linear algebra and queueing theory models.  I don¹t like linear
> >> algebra :P
> >>
> >> Since you are probably putting all your latencies into a single
> >>histogram
> >> per producer, or worse, just an average, this pattern would have been
> >> obscured.  Obligatory lecture about measuring latency by Gil Tene
> >> (https://www.youtube.com/watch?v=9MKY4KypBzg).  To verify this
> >>hypothesis,
> >> you should re-write the benchmark to plot the latencies for each write
> >>to
> >> a partition for each producer into a histogram. (HRD histogram is pretty
> >> good for that).  This would give you producers*partitions histograms,
> >> which might be unwieldy for that many producers. But wait, there is
> >>hope!
> >>
> >> To verify that this hypothesis holds, you just have to see that there
> >>is a
> >> significant difference between different partitions on a SINGLE
> >>producing
> >> client. So, pick one producing client at random and use the data from
> >> that. The easy way to do that is just plot all the partition latency
> >> histograms on top of each other in the same plot, that way you have a
> >> pretty plot to show people.  If you don¹t want to setup plotting, you
> >>can
> >> just compare the medians (50¹th percentile) of the partitions¹
> >>histograms.
> >>  If there is a lot of variance, your latency anomaly is explained by
> >> brokers 4-7 being slower than nodes 0-3!  If there isn¹t a lot of
> >>variance
> >> at 50%, look at higher percentiles.  And if higher percentiles for all
> >>the
> >> partitions look the same, this hypothesis is disproved.
> >>
> >> If you want to make a general statement about latency of writing to
> >>kafka,
> >> you can merge all the histograms into a single histogram and plot that.
> >>
> >> To Yuheng¹s credit, more brokers always results in more throughput. But
> >> throughput and latency are two different creatures.  Its worth noting
> >>that
> >> kafka is designed to be high throughput first and low latency second.
> >>And
> >> it does a really good job at both.
> >>
> >> Disclaimer: I might not like linear algebra, but I do like statistics.
> >> Let me know if there are topics that need more explanation above that
> >> aren¹t covered by Gil¹s lecture.
> >> -Erik
> >>
> >> On 9/4/15, 9:03 AM, "Yuheng Du" <yuheng.du.hust@gmail.com> wrote:
> >>
> >> >When I using 32 partitions, the 4 brokers latency becomes larger than
> >>the
> >> >8
> >> >brokers latency.
> >> >
> >> >So is it always true that using more brokers can give less latency when
> >> >the
> >> >number of partitions is at least the size of the brokers?
> >> >
> >> >Thanks.
> >> >
> >> >On Thu, Sep 3, 2015 at 10:45 PM, Yuheng Du <yuheng.du.hust@gmail.com>
> >> >wrote:
> >> >
> >> >> I am running a producer latency test. When using 92 producers in 92
> >> >> physical node publishing to 4 brokers, the latency is slightly lower
> >> >>than
> >> >> using 8 brokers, I am using 8 partitions for the topic.
> >> >>
> >> >> I have rerun the test and it gives me the same result, the 4 brokers
> >> >> scenario still has lower latency than the 8 brokers scenarios.
> >> >>
> >> >> It is weird because I tested 1broker, 2 brokers, 4 brokers, 8
> >>brokers,
> >> >>16
> >> >> brokers and 32 brokers. For the rest of the case the latency
> >>decreases
> >> >>as
> >> >> the number of brokers increase.
> >> >>
> >> >> 4 brokers/8 brokers is the only pair that doesn't satisfy this rule.
> >> >>What
> >> >> could be the cause?
> >> >>
> >> >> I am using a 200 bytes message, the test let each producer publishes
> >> >>500k
> >> >> messages to a given topic. Every test run when I change the number
of
> >> >> brokers, I use a new topic.
> >> >>
> >> >> Thanks for any advices.
> >> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message