kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: Killing broker leader
Date Wed, 15 Jan 2014 19:04:39 GMT
Hello Hanish,

Did you see "failed to send messages after 10 tries" in your producer log?

Guozhang


On Wed, Jan 15, 2014 at 8:38 AM, Hanish Bansal <
hanish.bansal.agarwal@gmail.com> wrote:

> Hi Francois,
>
> Probably Kafka-1193 is not due to any misconfiguration, there may be
> something else is missing. I also tried both (max retries 10 and producer
> acks -1) together, that was also causing data loss.
>
>
> On Wed, Jan 15, 2014 at 9:40 PM, François Langelier
> <f.langelier@gmail.com>wrote:
>
> > Yeah, that's what i found... sorry about that...
> >
> > Well, i tested it out!
> > At first, my command was :
> >
> > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
> > my-replicated-topic --message-send-max-retries 10 --retry-backoff-ms 1000
> >
> >
> > Then i noticed that only the second message i sent got lost... So i tough
> > it could be because my producer get the message but didn't have the time
> to
> > replicate it (?) before it get killed, so i tried that because i have 3
> > brokers :
> >
> > bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
> > my-replicated-topic --message-send-max-retries 10 --retry-backoff-ms 1000
> > --request-required-acks 2
> >
> > And now that's working...
> >
> > So since the beginning I had no "bug" just a misconfiguration...
> >
> > @Hanish: maybe https://issues.apache.org/jira/browse/KAFKA-1193 is a
> > misconfiguration too? I know you tried  --message-send-max-retries 10 and
> >  --request-required-acks -1 but have you tried both together?
> >
> > Thank for your help guys!
> >
> >
> >
> > François Langelier
> > Étudiant en génie Logiciel - École de Technologie
> > Supérieure<http://www.etsmtl.ca/>
> > Capitaine Club Capra <http://capra.etsmtl.ca/>
> > VP-Communication - CS Games <http://csgames.org> 2014
> > Jeux de Génies <http://www.jdgets.com/> 2011 à 2014
> > Argentier Fraternité du Piranhas <http://fraternitedupiranha.com/>
> > 2012-2014
> > Comité Organisateur Olympiades ÉTS 2012
> > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> >
> >
> > On Wed, Jan 15, 2014 at 10:44 AM, Jun Rao <junrao@gmail.com> wrote:
> >
> > > Those are actually producer side configs.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Jan 15, 2014 at 6:51 AM, François Langelier
> > > <f.langelier@gmail.com>wrote:
> > >
> > > > Nope, it's the "3 tries" messages... maybe I did something wrong...
> > > >
> > > > i put
> > > >
> > > > message.send.max.retries=10
> > > > retry.backoff.ms=1000
> > > >
> > > > in my server.properties of each broker
> > > >
> > > > I'm checking it right now!
> > > >
> > > >
> > > >
> > > >
> > > > François Langelier
> > > > Étudiant en génie Logiciel - École de Technologie
> > > > Supérieure<http://www.etsmtl.ca/>
> > > > Capitaine Club Capra <http://capra.etsmtl.ca/>
> > > > VP-Communication - CS Games <http://csgames.org> 2014
> > > > Jeux de Génies <http://www.jdgets.com/> 2011 à 2014
> > > > Argentier Fraternité du Piranhas <http://fraternitedupiranha.com/>
> > > > 2012-2014
> > > > Comité Organisateur Olympiades ÉTS 2012
> > > > Compétition Québécoise d'Ingénierie 2012 - Compétition Senior
> > > >
> > > >
> > > > On Tue, Jan 14, 2014 at 7:32 PM, Guozhang Wang <wangguoz@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Francois, just a quick question, when you set the number of
> > retries
> > > to
> > > > > 10, does its log still have "ailed to send messages after 10
> tries."
> > > > entry?
> > > > >
> > > > >
> > > > > On Tue, Jan 14, 2014 at 11:41 AM, Francois Langelier <
> > > > > francois.langelier@mate1inc.com> wrote:
> > > > >
> > > > > > Of course! As soon as it will close, i'll try!
> > > > > >
> > > > > >
> > > > > > On Tue, Jan 14, 2014 at 2:29 PM, Guozhang Wang <
> wangguoz@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hanish and Francois,
> > > > > > >
> > > > > > > The current patch of 1193 still have something missing
and I am
> > > > > currently
> > > > > > > working on it to be closed soon. Could you retry the scenario
> > after
> > > > it
> > > > > is
> > > > > > > checked in?
> > > > > > >
> > > > > > > Guozhang
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 14, 2014 at 8:03 AM, Francois Langelier <
> > > > > > > francois.langelier@mate1inc.com> wrote:
> > > > > > >
> > > > > > > > @Guozhang Wang: I set the max retries to 10 and the
backoof
> at
> > > 1000
> > > > > ms
> > > > > > > but
> > > > > > > > the bug still there and some messages don't reach
my
> > consumers...
> > > > > > > >
> > > > > > > > @Hanish : Yes, it looks like we have the same issue!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Jan 13, 2014 at 9:43 PM, Hanish Bansal <
> > > > > > > > hanish.bansal.agarwal@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > I am not sure but this may be same scenario as
described in
> > > > > > > > > https://issues.apache.org/jira/browse/KAFKA-1193
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Jan 14, 2014 at 2:36 AM, Guozhang Wang
<
> > > > wangguoz@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > When the producer have exhausted all retries
on sending,
> > the
> > > > data
> > > > > > > will
> > > > > > > > be
> > > > > > > > > > dropped on the floor. One possible reason
for this to
> > happen
> > > is
> > > > > the
> > > > > > > > > leader
> > > > > > > > > > failover taking long to let producer fails
all 3 retries.
> > > > > > > > > >
> > > > > > > > > > You may want to tune the following two configs
on
> > producers (
> > > > > > > > > >
> > https://kafka.apache.org/documentation.html#producerconfigs)
> > > > to
> > > > > > see
> > > > > > > if
> > > > > > > > > > this
> > > > > > > > > > scenario can be solved:
> > > > > > > > > >
> > > > > > > > > > message.send.max.retries (default 3)
> > > > > > > > > >
> > > > > > > > > > retry.backoff.ms (default 100)
> > > > > > > > > >
> > > > > > > > > > Guozhang
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Mon, Jan 13, 2014 at 6:34 AM, Francois
Langelier <
> > > > > > > > > > francois.langelier@mate1inc.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > Yes, i have this message :  ERROR Error
in handling
> batch
> > > of
> > > > 1
> > > > > > > events
> > > > > > > > > > > (kafka.producer.async.ProducerSendThread)
> > > > > > > > > > > kafka.common.FailedToSendMessageException:
Failed to
> send
> > > > > > messages
> > > > > > > > > after
> > > > > > > > > > 3
> > > > > > > > > > > tries.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Sun, Jan 12, 2014 at 10:17 PM, Guozhang
Wang <
> > > > > > > wangguoz@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On the producer client log, did
you see something
> like
> > > > > "failed
> > > > > > to
> > > > > > > > > send
> > > > > > > > > > > ...
> > > > > > > > > > > > after .. retries"?
> > > > > > > > > > > >
> > > > > > > > > > > > Guozhang
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jan 8, 2014 at 11:44 AM,
Francois Langelier <
> > > > > > > > > > > > francois.langelier@mate1inc.com>
wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Thank you for your answers
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Guozhang: I can't find the
"ack value" in my
> > > console...
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Marc: I'm testing some stuff
on 0.8 before
> migrating
> > > 0.7
> > > > > to
> > > > > > > 0.8,
> > > > > > > > > > > that's
> > > > > > > > > > > > > why I'm killing it instead
of controlled shutdown.
> > > > > > > > > > > > >
> > > > > > > > > > > > > @Jun: I create it using this
command :
> > > > > > > *bin/kafka-create-topic.sh
> > > > > > > > > > > > > --zookeeper localhost:2181
--replica 3 --partition
> 1
> > > > > --topic
> > > > > > > > > > > > > my-replicated-topic*
> > > > > > > > > > > > > And here is the output of
the list topic command:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *topic: my-replicated-topic
partition: 0 leader: 0
> > > > > replicas:
> > > > > > > > 0,2,1
> > > > > > > > > > isr:
> > > > > > > > > > > > > 0,1,2*
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I continued investigating
on my own and here some
> > > others
> > > > > > > > > > informations :
> > > > > > > > > > > > >
> > > > > > > > > > > > >    - I use the *bin/kafka-server-start.sh
*script
> to
> > > > start
> > > > > > the
> > > > > > > > > > servers
> > > > > > > > > > > > and
> > > > > > > > > > > > >    I use the consumers and
producers scripts in the
> > > bin/
> > > > > > > > > > > > >    - When I kill the leader,
for about 5 seconds I
> > > > receive
> > > > > > java
> > > > > > > > > > > exception
> > > > > > > > > > > > >    error in my consumers
consoles and if i try to
> > send
> > > > > > message
> > > > > > > > > > through
> > > > > > > > > > > > the
> > > > > > > > > > > > >    producers console, I also
have java exception.
> > > > > > Furthermore,
> > > > > > > > all
> > > > > > > > > > the
> > > > > > > > > > > > >    messages I send during
that time through the
> > > producers
> > > > > > never
> > > > > > > > > reach
> > > > > > > > > > > the
> > > > > > > > > > > > >    consumers, even after
the "5 seconds"
> > > > > > > > > > > > >    - When the "5 seconds"
is over, the link is
> > > "repaired"
> > > > > and
> > > > > > > all
> > > > > > > > > the
> > > > > > > > > > > > news
> > > > > > > > > > > > >    messages reach their destinations
(not those
> > within
> > > > the
> > > > > "5
> > > > > > > > > > seconds")
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Dec 18, 2013 at 12:16
AM, Jun Rao <
> > > > > junrao@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > What's the replication
factor of the topic? Is it
> > > > larger
> > > > > > than
> > > > > > > > 1?
> > > > > > > > > > You
> > > > > > > > > > > > can
> > > > > > > > > > > > > > find out using the list
topic command.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Jun
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Dec 17, 2013
at 2:39 PM, Francois
> > Langelier <
> > > > > > > > > > > > > > francois.langelier@mate1inc.com>
wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I installed zookeeper
and kafka 8.0 following
> the
> > > > quick
> > > > > > > > start (
> > > > > > > > > > > > > > >
> > > > https://kafka.apache.org/documentation.html#quickstart
> > > > > )
> > > > > > > and
> > > > > > > > > > when i
> > > > > > > > > > > > try
> > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > kill my leader,
i got a lot of exception in my
> > > > producer
> > > > > > and
> > > > > > > > > > > consumer
> > > > > > > > > > > > > > > consoles.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Then, after the
exceptions stop printing, some
> of
> > > the
> > > > > > > > messages
> > > > > > > > > I
> > > > > > > > > > > > > produce
> > > > > > > > > > > > > > in
> > > > > > > > > > > > > > > my console don't
print in my consumer
> console...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The exception I
get is
> > "java.net.ConnectException :
> > > > > > > > Connection
> > > > > > > > > > > > > refused".
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Did someone already
had this problem?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thank you!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > PS: I have 3 brokers
running on my system.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > -- Guozhang
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > *Thanks & Regards*
> > > > > > > > > *Hanish Bansal*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
>
>
>
> --
> *Thanks & Regards*
> *Hanish Bansal*
>



-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message