kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Black...@b3k.us>
Subject Re: Kafka 0.7 performance compared to bare metal
Date Fri, 30 Aug 2013 17:56:13 GMT
Your producer test uses a thread per core. Your consumer test uses a single
thread. A single thread is likely insufficient to get maximum throughput.
On Aug 30, 2013 8:46 AM, "Rafael Bagmanov" <bugzmanov@gmail.com> wrote:

> Bejamin, do you mean thread on a client side? I'm not quite getting
> what I'm limited with. Can you please explain little bit more?
>
> A single threaded producer is still capable of doing 50 MB/s on
> hi1.4xlarge.
> Which is quite slower than 377 MB/s from single job of FIO. But still
> 5 times faster than what I'm getting from consumer.
> Is it as expected to be?
>
> Another mystery for me is that in case of hot IO cache (whole topic is
> in memory): I'm getting 50 MB/s - 100 MB/s (this huge std. dev. bugs
> me too) from a single threaded consumer.
>
> And when cache is cold, I'm not seeing that kafka broker making best
> possible from SSD it has.
> I've tried setting fetch-size to 100 MB, but still kafka hits disk
> with 10 MB/s. (the disk by itself can satisfy much more read requests
> with same latency and provide much higher throughput).
>
> For me it looks as if
> http://man7.org/linux/man-pages/man2/sendfile.2.html somehow works
> inefficiently with SSD. And I don't understand why and how can this be
> fixed.
>
> I do understand that you advising me to use more partitions and more
> consumer threads. But I would like to know the limits I'm hitting with
> this single threaded mode.
>
> Thanks!
>
> Rafael Bagmanov,
> Grid Dynamics
>
> 2013/8/30 Benjamin Black <b@b3k.us>:
> > You are maxing out the single consumer thread.
> > On Aug 30, 2013 1:35 AM, "Rafael Bagmanov" <bugzmanov@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I am trying to understand how fast is kafka 0.7 compared to what I can
> get
> >> from hard drive. In essence I have 3 questions.
> >>
> >> In all tests below, I'm using single broker with single one-partitioned
> >> topic. Kafka perf tests have been run in 2 deployment configs:
> >> - broker, perf-test on same host
> >> - broker, perf-test on different hosts (the results are practically the
> >> same, so wont post them here)
> >>
> >>
> >> I'm using FIO(http://freecode.com/projects/fio) to benchmark speed of
> hard
> >> drives.
> >>
> >> Hardware I'm using:
> >> 1) m1.xlarge with ephemeral storage, 4 core cpu, 16 GB ram
> >> 2) hi1.4xlarge  with SSD, 16 core cpu, 64 GB ram
> >> 3) desktop machine with 7200 rpm sata, 4 core cpu, 8 GB ram
> >>
> >> Kafka broker config:
> >> Oracle jdk 1.6.0_38,  -Xmx2048
> >>
> >> socket.send.buffer=16777216
> >> socket.receive.buffer=16777216
> >> max.socket.request.bytes=104857600
> >> log.flush.interval=10000
> >> log.default.flush.interval.ms=1000
> >> log.default.flush.scheduler.interval.ms=1000
> >> num.threads=[num of cores]
> >>
> >>
> >> For kafka-producer-perf-test I'm assuming that IO access pattern is
> >> sequential write.
> >>
> >> Here is the test I ran with FIO:
> >>
> >> [sequential-write]
> >> rw=write
> >> size=50G
> >> ioengine=sync
> >> numjobs=1
> >> directory=/tmp/fio
> >> filename=redo01.log
> >>
> >>
> >> Here is kafka performance test:
> >>
> >> ./bin/kafka-producer-perf-test.sh -topic "perf" --batch-size 3000
> >> --messages 50000000 --message-size 1300 --brokerinfo
> >> broker.list=0:host:9092 --threads [number-of-cores]
> >>
> >>
> >>
> ----------------------------------------------------------------------------------------
> >> |           |   m1.xlarge            |    hi1.4xlarge       |  desktop
> >>  |
> >>
> >>
>  ----------------------------------------------------------------------------------------
> >> |  kafka  |     41 MB/s           |      217 MB/s       |     42 MB/s
> |
> >>
> >>
>  -----------------------------------------------------------------------------------------
> >> |  fio      |     106 MB/s          |      377 MB/s       |    74 MB/s
>   |
> >>
> >>
> ----------------------------------------------------------------------------------------
> >>
> >>
> >> Question 1: The proportion (~1/2) is pretty stable against different
> kind
> >> of hardware I've tried. Is it as expected? Can something be done to
> improve
> >> this?
> >>
> >> I've tried to play with:
> >> log.flush.interval=10000
> >> log.default.flush.interval.ms=1000
> >> log.default.flush.scheduler.interval.ms=1000
> >>
> >> Like increasing 10 times, or decreasing 10 times, but haven't seen much
> of
> >> a difference in IO  throughput
> >>
> >> The other thing that bugs me much more is that kafka consumer speed on
> cold
> >> IO cache is like 5-50 times slower from what I can get with "sequential
> >> read" fio test.
> >>
> >> For kafka-consumer-perf-test I'm assuming that IO access pattern is
> >> sequential read.
> >>
> >> Here is FIO test:
> >>
> >> [sequential-read]
> >> rw=read
> >> size=50G
> >> ioengine=sync   # I know that kafka use sendfile, but sync should be
> >> slower, right?
> >> numjobs=1
> >>  directory=/tmp/fio
> >> filename=redo01.log
> >>
> >> Here what I'm doing with kafka-consumer-perf-test:
> >>
> >> kafka-consumer-perf-test.sh -topic "perf" --messages 50000000
> --zookeeper
> >> host:2181 --threads 1 --socket-buffer-size 16777216 --fetch-size
> 16777216
> >>
> >> The broker config is  the same.
> >>
> >> I'm dropping IO cache before running tests: echo 3 >
> >> /proc/sys/vm/drop_caches
> >>
> >>
> >>
> -----------------------------------------------------------------------------------------------
> >> |           |   m1.xlarge            |    hi1.4xlarge              |
> >>  desktop    |
> >>
> >>
>  ---------------------------------------------------------------------------------------------
> >> |  kafka  |    25   MB/s           |     10  MB/s   (???)    |   20
>  MB/s
> >>  |
> >>
> >>
>  ---------------------------------------------------------------------------------------------
> >> |  fio      |   130   MB/s          |     450  MB/s             |    67
> >> MB/s  |
> >>
> >>
> ----------------------------------------------------------------------------------------------
> >>
> >> Question 2: Can something be done to improve consumer performance?
> >>
> >> Question 3 (most improtant for me): What might be the reasons for
> consumer
> >> to behave so badly on fastest hardware available? I see in iostat, that
> >> consumer really does very little read requests to hard drive
> >>
> >> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz
> >> avgqu-sz   await r_await w_await  svctm  %util
> >> xvdb              0.00     0.00  144.00    0.00  6144.00     0.00
>  85.33
> >>     0.06    0.42    0.42    0.00   0.08   1.20
> >>
> >> And cpus are idling
> >>
> >> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >>            2.16    0.00    0.09    0.06    0.03   97.66
> >>
> >>
> >> Besides that, even if the whole topic is in IO cache, the consumer
> speed is
> >> about 45 MB/s which is still quite below my expectations.
> >>
> >> And the picture doesn't change in different deployment configs (broker
> and
> >> test on same node or 2 different nodes)
> >>
> >> Any ideas why this might happen?
> >>
> >> Rafael Bagmanov,
> >> Grid Dynamics.
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message