kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Flinck <andreas.fli...@digitalroute.com>
Subject Re: What is the benefit of using acks=all and minover e.g. acks=3
Date Tue, 01 Dec 2015 10:30:23 GMT
Hi

We have run the tests with your proposed properties, but with the same result. However, we
noticed that kafka broker only seems to run on 1 out of 72 cores with 600% cpu usage. It is
obviously overloading one core without scaling threading.

The test environment is running RedHat 6.7 and java 1.8.0_65.

You have any idea why the broker process is not scaling across cores? Are there any more kafka
broker properties or OS level settings to solve this issue?

Thanks in advance!

Andreas


On 28 Nov 2015, at 17:45, Prabhjot Bharaj <prabhbharaj@gmail.com<mailto:prabhbharaj@gmail.com>>
wrote:


Hi,

Of all the parameters, num.replica.fetchers should be kept higher to 4 can be of help.
Please try it out and let us know if it worked

Thanks,
Prabhjot

On Nov 28, 2015 4:59 PM, "Andreas Flinck" <andreas.flinck@digitalroute.com<mailto:andreas.flinck@digitalroute.com>>
wrote:
Hi!

Here are our settings for the properties requested:

num.network.threads=3
socket.request.max.bytes=104857600
socket.receive.buffer.bytes=1048576
socket.send.buffer.bytes=1048576

The following properties we don't set at all, so I guess they will default according to the
documentation (within parenthesis):

"num.replica.fetchers": (1)
"replica.fetch.wait.max.ms<http://replica.fetch.wait.max.ms/>": (500),
"num.recovery.threads.per.data.dir": (1)

The producer properties we explicitly set are the following;

block.on.buffer.full=false
client.id<http://client.id/>=MZ
max.request.size=1048576
acks=all
retries=0
timeout.ms<http://timeout.ms/>=30000
buffer.memory=67108864
metadata.fetch.timeout.ms<http://metadata.fetch.timeout.ms/>=3000

Do let me know what you think about it! We are currently setting up some tests using the broker
properties that you suggested.

Regards
Andreas






________________________________________
Från: Prabhjot Bharaj <prabhbharaj@gmail.com<mailto:prabhbharaj@gmail.com>>
Skickat: den 28 november 2015 11:37
Till: users@kafka.apache.org<mailto:users@kafka.apache.org>
Ämne: Re: What is the benefit of using acks=all and minover e.g. acks=3

Hi,

Clogging can happen if, as seems in your case, the requests are bounded by
network.
Just to confirm your configurations, does your broker configuration look
like this?? :-

"num.replica.fetchers": 4,
"replica.fetch.wait.max.ms<http://replica.fetch.wait.max.ms/>": 500,
"num.recovery.threads.per.data.dir": 4,


"num.network.threads": 8,
"socket.request.max.bytes": 104857600,
"socket.receive.buffer.bytes": 10485760,
"socket.send.buffer.bytes": 10485760,

Similarly, please share your producer config as well. I'm thinking may be
it is related to tuning your cluster.

Thanks,
Prabhjot


On Sat, Nov 28, 2015 at 3:54 PM, Andreas Flinck <
andreas.flinck@digitalroute.com<mailto:andreas.flinck@digitalroute.com>> wrote:

> Great, thanks for the information! So it is definitely acks=all we want to
> go for. Unfortunately we run into an blocking issue in our production like
> test environment which we have not been able to find a solution for. So
> here it is, ANY idea on how we could possibly find a solution is very much
> appreciated!
>
> Environment:
> Kafka version: kafka_2.11-0.8.2.1
> 5 kafka brokers and 5 ZK on spread out on 5 hosts
> Using new producer (async)
>
> Topic:
> partitions=10
> replication-factor=4
> min.insync.replicas=2
>
> Default property values used for broker configs and producer.
>
> Scenario and problem:
> Incoming diameter data (10k TPS) is sent to 5 topics via 5 producers which
> is working great until we start another 5 producers sending to another 5
> topics with the same rate (10k). What happens then is that the producers
> sending to 2 of the topics fills up the buffer and the throughput becomes
> very low, with BufferExhaustedExceptions for most of the messages. When
> checking the latency for the problematic topics it becomes really high
> (around 150ms). Stopping the 5 producers that were started in the second
> round, the latency goes down to about 1 ms again and the buffer will go
> back to normal. The load is not that high, about 10MB/s, it is not even
> near disk bound.
> So the questions right now are, why do we get such high latency to
> specifically two topics when starting more producers, even though cpu and
> disk load looks unproblematic? And why two topics specifically, is there an
> order of what topics to prfioritize when things get clogged for some reason?
>
> Sorry for the quite messy description, we are all kind of new at kafka
> here!
>
> BR
> Andreas
>
> > On 28 Nov 2015, at 09:26, Prabhjot Bharaj <prabhbharaj@gmail.com<mailto:prabhbharaj@gmail.com>>
wrote:
> >
> > Hi,
> >
> > This should help :)
> >
> > During my benchmarks, I noticed that if 5 node kafka cluster running 1
> > topic is given a continuous injection of 50GB in one shot (using a
> modified
> > producer performance script, which writes my custom data to kafka), the
> > last replica can sometimes lag and it used to catch up at a speed of 1GB
> in
> > 20-25 seconds. This lag increases if producer performance injects 200GB
> in
> > one shot.
> >
> > I'm not sure how it will behave with multiple topics.  it could have an
> > impact on the overall throughput (because more partitions will be alive
> on
> > the same broker thereby dividing the network usage), but I have to test
> it
> > in staging environment
> >
> > Regards,
> > Prabhjot
> >
> > On Sat, Nov 28, 2015 at 12:10 PM, Gwen Shapira <gwen@confluent.io<mailto:gwen@confluent.io>>
> wrote:
> >
> >> Hi,
> >>
> >> min.insync.replica is alive and well in 0.9 :)
> >>
> >> Normally, you will have 4 our of 4 replicas in sync. However if one of
> the
> >> replicas will fall behind, you will have 3 out of 4 in sync.
> >> If you set min.insync.replica = 3, produce requests will fail if the
> number
> >> on in-sync replicas fall below 3.
> >>
> >> I hope this helps.
> >>
> >> Gwen
> >>
> >> On Fri, Nov 27, 2015 at 9:43 PM, Prabhjot Bharaj <prabhbharaj@gmail.com<mailto:prabhbharaj@gmail.com>
> >
> >> wrote:
> >>
> >>> Hi Gwen,
> >>>
> >>> How about min.isr.replicas property?
> >>> Is it still valid in the new version 0.9 ?
> >>>
> >>> We could get 3 out of 4 replicas in sync if we set it's value to 3.
> >>> Correct?
> >>>
> >>> Thanks,
> >>> Prabhjot
> >>> On Nov 28, 2015 10:20 AM, "Gwen Shapira" <gwen@confluent.io<mailto:gwen@confluent.io>>
wrote:
> >>>
> >>>> In your scenario, you are receiving acks from 3 replicas while it is
> >>>> possible to have 4 in the ISR. This means that one replica can be up
> to
> >>>> 4000 messages (by default) behind others. If a leader crashes, there
> is
> >>> 33%
> >>>> chance this replica will become the new leader, thereby losing up to
> >> 4000
> >>>> messages.
> >>>>
> >>>> acks = all requires all ISR to ack as long as they are in the ISR,
> >>>> protecting you from this scenario (but leading to high latency if a
> >>> replica
> >>>> is hanging and is just about to drop out of the ISR).
> >>>>
> >>>> Also, note that in future versions acks > 1 was deprecated, to protect
> >>>> against such subtle mistakes.
> >>>>
> >>>> Gwen
> >>>>
> >>>> On Fri, Nov 27, 2015 at 12:28 AM, Andreas Flinck <
> >>>> andreas.flinck@digitalroute.com<mailto:andreas.flinck@digitalroute.com>>
wrote:
> >>>>
> >>>>> Hi all
> >>>>>
> >>>>> The reason why I need to know is that we have seen an issue when
> >> using
> >>>>> acks=all, forcing us to quickly find an alternative. I leave the
> >> issue
> >>>> out
> >>>>> of this post, but will probably come back to that!
> >>>>>
> >>>>> My question is about acks=all and min.insync.replicas property.
Since
> >>> we
> >>>>> have found a workaround for an issue by using acks>1 instead
of all
> >>>>> (absolutely no clue why at this moment), I would like to know what
> >>>> benefit
> >>>>> you get from e.g. acks=all and min.insync.replicas=3 instead of
using
> >>>>> acks=3 in a 5 broker cluster and replication-factor of 4. To my
> >>>>> understanding you would get the exact level of durability and
> >> security
> >>>> from
> >>>>> using either of those settings. However, I suspect this is not quite
> >>> the
> >>>>> case from finding hints without proper explanation that acks=all
is
> >>>>> preferred.
> >>>>>
> >>>>>
> >>>>> Regards
> >>>>> Andreas
> >>>>
> >>>
> >>
> >
> >
> >
> > --
> > ---------------------------------------------------------
> > "There are only 10 types of people in the world: Those who understand
> > binary, and those who don't"
>
>


--
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message