kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanish Bansal <hanish.bansal.agar...@gmail.com>
Subject Re: Killing broker leader
Date Wed, 15 Jan 2014 16:38:30 GMT
Hi Francois,

Probably Kafka-1193 is not due to any misconfiguration, there may be
something else is missing. I also tried both (max retries 10 and producer
acks -1) together, that was also causing data loss.


On Wed, Jan 15, 2014 at 9:40 PM, François Langelier
<f.langelier@gmail.com>wrote:

> Yeah, that's what i found... sorry about that...
>
> Well, i tested it out!
> At first, my command was :
>
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
> my-replicated-topic --message-send-max-retries 10 --retry-backoff-ms 1000
>
>
> Then i noticed that only the second message i sent got lost... So i tough
> it could be because my producer get the message but didn't have the time to
> replicate it (?) before it get killed, so i tried that because i have 3
> brokers :
>
> bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
> my-replicated-topic --message-send-max-retries 10 --retry-backoff-ms 1000
> --request-required-acks 2
>
> And now that's working...
>
> So since the beginning I had no "bug" just a misconfiguration...
>
> @Hanish: maybe https://issues.apache.org/jira/browse/KAFKA-1193 is a
> misconfiguration too? I know you tried  --message-send-max-retries 10 and
>  --request-required-acks -1 but have you tried both together?
>
> Thank for your help guys!
>
>
>
> François Langelier
> Étudiant en génie Logiciel - École de Technologie
> Supérieure<http://www.etsmtl.ca/>
> Capitaine Club Capra <http://capra.etsmtl.ca/>
> VP-Communication - CS Games <http://csgames.org> 2014
> Jeux de Génies <http://www.jdgets.com/> 2011 à 2014
> Argentier Fraternité du Piranhas <http://fraternitedupiranha.com/>
> 2012-2014
> Comité Organisateur Olympiades ÉTS 2012
> Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
>
>
> On Wed, Jan 15, 2014 at 10:44 AM, Jun Rao <junrao@gmail.com> wrote:
>
> > Those are actually producer side configs.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 15, 2014 at 6:51 AM, François Langelier
> > <f.langelier@gmail.com>wrote:
> >
> > > Nope, it's the "3 tries" messages... maybe I did something wrong...
> > >
> > > i put
> > >
> > > message.send.max.retries=10
> > > retry.backoff.ms=1000
> > >
> > > in my server.properties of each broker
> > >
> > > I'm checking it right now!
> > >
> > >
> > >
> > >
> > > François Langelier
> > > Étudiant en génie Logiciel - École de Technologie
> > > Supérieure<http://www.etsmtl.ca/>
> > > Capitaine Club Capra <http://capra.etsmtl.ca/>
> > > VP-Communication - CS Games <http://csgames.org> 2014
> > > Jeux de Génies <http://www.jdgets.com/> 2011 à 2014
> > > Argentier Fraternité du Piranhas <http://fraternitedupiranha.com/>
> > > 2012-2014
> > > Comité Organisateur Olympiades ÉTS 2012
> > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > >
> > >
> > > On Tue, Jan 14, 2014 at 7:32 PM, Guozhang Wang <wangguoz@gmail.com>
> > wrote:
> > >
> > > > Hi Francois, just a quick question, when you set the number of
> retries
> > to
> > > > 10, does its log still have "ailed to send messages after 10 tries."
> > > entry?
> > > >
> > > >
> > > > On Tue, Jan 14, 2014 at 11:41 AM, Francois Langelier <
> > > > francois.langelier@mate1inc.com> wrote:
> > > >
> > > > > Of course! As soon as it will close, i'll try!
> > > > >
> > > > >
> > > > > On Tue, Jan 14, 2014 at 2:29 PM, Guozhang Wang <wangguoz@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hanish and Francois,
> > > > > >
> > > > > > The current patch of 1193 still have something missing and I
am
> > > > currently
> > > > > > working on it to be closed soon. Could you retry the scenario
> after
> > > it
> > > > is
> > > > > > checked in?
> > > > > >
> > > > > > Guozhang
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 14, 2014 at 8:03 AM, Francois Langelier <
> > > > > > francois.langelier@mate1inc.com> wrote:
> > > > > >
> > > > > > > @Guozhang Wang: I set the max retries to 10 and the backoof
at
> > 1000
> > > > ms
> > > > > > but
> > > > > > > the bug still there and some messages don't reach my
> consumers...
> > > > > > >
> > > > > > > @Hanish : Yes, it looks like we have the same issue!
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jan 13, 2014 at 9:43 PM, Hanish Bansal <
> > > > > > > hanish.bansal.agarwal@gmail.com> wrote:
> > > > > > >
> > > > > > > > I am not sure but this may be same scenario as described
in
> > > > > > > > https://issues.apache.org/jira/browse/KAFKA-1193
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jan 14, 2014 at 2:36 AM, Guozhang Wang <
> > > wangguoz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > When the producer have exhausted all retries
on sending,
> the
> > > data
> > > > > > will
> > > > > > > be
> > > > > > > > > dropped on the floor. One possible reason for
this to
> happen
> > is
> > > > the
> > > > > > > > leader
> > > > > > > > > failover taking long to let producer fails all
3 retries.
> > > > > > > > >
> > > > > > > > > You may want to tune the following two configs
on
> producers (
> > > > > > > > >
> https://kafka.apache.org/documentation.html#producerconfigs)
> > > to
> > > > > see
> > > > > > if
> > > > > > > > > this
> > > > > > > > > scenario can be solved:
> > > > > > > > >
> > > > > > > > > message.send.max.retries (default 3)
> > > > > > > > >
> > > > > > > > > retry.backoff.ms (default 100)
> > > > > > > > >
> > > > > > > > > Guozhang
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Jan 13, 2014 at 6:34 AM, Francois Langelier
<
> > > > > > > > > francois.langelier@mate1inc.com> wrote:
> > > > > > > > >
> > > > > > > > > > Yes, i have this message :  ERROR Error
in handling batch
> > of
> > > 1
> > > > > > events
> > > > > > > > > > (kafka.producer.async.ProducerSendThread)
> > > > > > > > > > kafka.common.FailedToSendMessageException:
Failed to send
> > > > > messages
> > > > > > > > after
> > > > > > > > > 3
> > > > > > > > > > tries.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Sun, Jan 12, 2014 at 10:17 PM, Guozhang
Wang <
> > > > > > wangguoz@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > On the producer client log, did you
see something like
> > > > "failed
> > > > > to
> > > > > > > > send
> > > > > > > > > > ...
> > > > > > > > > > > after .. retries"?
> > > > > > > > > > >
> > > > > > > > > > > Guozhang
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jan 8, 2014 at 11:44 AM, Francois
Langelier <
> > > > > > > > > > > francois.langelier@mate1inc.com>
wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thank you for your answers
> > > > > > > > > > > >
> > > > > > > > > > > > @Guozhang: I can't find the "ack
value" in my
> > console...
> > > > > > > > > > > >
> > > > > > > > > > > > @Marc: I'm testing some stuff
on 0.8 before migrating
> > 0.7
> > > > to
> > > > > > 0.8,
> > > > > > > > > > that's
> > > > > > > > > > > > why I'm killing it instead of
controlled shutdown.
> > > > > > > > > > > >
> > > > > > > > > > > > @Jun: I create it using this command
:
> > > > > > *bin/kafka-create-topic.sh
> > > > > > > > > > > > --zookeeper localhost:2181 --replica
3 --partition 1
> > > > --topic
> > > > > > > > > > > > my-replicated-topic*
> > > > > > > > > > > > And here is the output of the
list topic command:
> > > > > > > > > > > >
> > > > > > > > > > > > *topic: my-replicated-topic partition:
0 leader: 0
> > > > replicas:
> > > > > > > 0,2,1
> > > > > > > > > isr:
> > > > > > > > > > > > 0,1,2*
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I continued investigating on my
own and here some
> > others
> > > > > > > > > informations :
> > > > > > > > > > > >
> > > > > > > > > > > >    - I use the *bin/kafka-server-start.sh
*script to
> > > start
> > > > > the
> > > > > > > > > servers
> > > > > > > > > > > and
> > > > > > > > > > > >    I use the consumers and producers
scripts in the
> > bin/
> > > > > > > > > > > >    - When I kill the leader, for
about 5 seconds I
> > > receive
> > > > > java
> > > > > > > > > > exception
> > > > > > > > > > > >    error in my consumers consoles
and if i try to
> send
> > > > > message
> > > > > > > > > through
> > > > > > > > > > > the
> > > > > > > > > > > >    producers console, I also have
java exception.
> > > > > Furthermore,
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > >    messages I send during that
time through the
> > producers
> > > > > never
> > > > > > > > reach
> > > > > > > > > > the
> > > > > > > > > > > >    consumers, even after the "5
seconds"
> > > > > > > > > > > >    - When the "5 seconds" is over,
the link is
> > "repaired"
> > > > and
> > > > > > all
> > > > > > > > the
> > > > > > > > > > > news
> > > > > > > > > > > >    messages reach their destinations
(not those
> within
> > > the
> > > > "5
> > > > > > > > > seconds")
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Dec 18, 2013 at 12:16
AM, Jun Rao <
> > > > junrao@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > What's the replication factor
of the topic? Is it
> > > larger
> > > > > than
> > > > > > > 1?
> > > > > > > > > You
> > > > > > > > > > > can
> > > > > > > > > > > > > find out using the list topic
command.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jun
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Dec 17, 2013 at 2:39
PM, Francois
> Langelier <
> > > > > > > > > > > > > francois.langelier@mate1inc.com>
wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I installed zookeeper
and kafka 8.0 following the
> > > quick
> > > > > > > start (
> > > > > > > > > > > > > >
> > > https://kafka.apache.org/documentation.html#quickstart
> > > > )
> > > > > > and
> > > > > > > > > when i
> > > > > > > > > > > try
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > kill my leader, i got
a lot of exception in my
> > > producer
> > > > > and
> > > > > > > > > > consumer
> > > > > > > > > > > > > > consoles.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then, after the exceptions
stop printing, some of
> > the
> > > > > > > messages
> > > > > > > > I
> > > > > > > > > > > > produce
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > my console don't print
in my consumer console...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The exception I get
is
> "java.net.ConnectException :
> > > > > > > Connection
> > > > > > > > > > > > refused".
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Did someone already
had this problem?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > PS: I have 3 brokers
running on my system.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > -- Guozhang
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > -- Guozhang
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > *Thanks & Regards*
> > > > > > > > *Hanish Bansal*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > -- Guozhang
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>



-- 
*Thanks & Regards*
*Hanish Bansal*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message