kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <ao...@wikimedia.org>
Subject Re: Zookeeper reconnect failed due to 'state changed (Expired)'
Date Wed, 01 Oct 2014 12:48:15 GMT
I understand that, but even if the leader quickly (within a few seconds) rejoins the cluster?
 I had thought that ack=1 meant that messages not yet replicated from a broker that has a
serious fatal failure (disk corruption, etc.) would be lost forever.  But, as long as broker
with the unreplicated messages comes back into the ISR, those messages should be replicated,
no?

Or, would the fact that the a new leader has been chosen for a partition, and new messages
have been sent to that new leader, cause the offending broker to drop the unreplicated messages
when it comes back into the ISR?


On Sep 30, 2014, at 7:17 PM, Jun Rao <junrao@gmail.com> wrote:

> With ack=1, acked messages could be lost when the leader fails.
> 
> Thanks,
> 
> Jun
> 
> On Mon, Sep 29, 2014 at 8:04 AM, Andrew Otto <otto@wikimedia.org> wrote:
> 
>> This happened again to me this weekend.  I've done some sleuthing, and
>> definitely can see some crazy paging stats when this lock up happens.  For
>> the curious, more info can be found here:
>> https://bugzilla.wikimedia.org/show_bug.cgi?id=69667.  I had tuned
>> dirty_expire_centisecs from 30 seconds to 10, but this still happened
>> again.  I'll continue to troubleshoot and tune.  This slow going because it
>> is not regularly reproducible.  I have to make a single change and then
>> wait a week or two for the timeout to occur.
>> 
>> Here's a related question.  When the timeout happens, we lose some
>> messages.  Our producer is varnishkafka, which uses the librdkafka producer
>> client.  librdkafka keeps track of produce errors.  We
>> have kafka.topic.request.required.acks = 1.  According to librdkafka, all
>> messages sent have been ACKed by the leader of the partition to which the
>> messages are sent.  Also, when we lose messages due to this timeout, the
>> broker that times out is always the controller.  When it attempts to
>> reconnect to Zookeeper, we see:
>> 
>>  INFO  kafka.utils.ZkUtils$  - conflict in /controller data:
>> {"version":1,"brokerid":21,"timestamp":"1411879981756"} stored data:
>> {"version":1,"brokerid":22,"timestamp":"1407187809296"}
>> 
>> In the case when a controller drops out of the ISR for a few seconds, is it
>> possible for this confused broker to drop ACKed messages?
>> 
>> 
>> 
>> On Thu, Jul 3, 2014 at 12:48 AM, Jun Rao <junrao@gmail.com> wrote:
>> 
>>> Are you on Linux? We have seen this pattern (user/sys time low and real
>>> time high in GC time) before. In our case, the problem was due to disk
>>> I/Os. When there are lots of dirty pages (in our case, this is caused by
>>> log4j logging), Linux can draft user threads (in this case GC threads) to
>>> flush the dirty pages. So, all those time in real was spent on disk I/Os,
>>> rather than real GCs. The fix is to tune dirty_expire_centisecs and
>>> dirty_writeback_centisecs
>>> to flush dirty pages more frequently to avoid such drafting.
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> 
>>> On Wed, Jul 2, 2014 at 1:32 PM, Andrew Otto <otto@wikimedia.org> wrote:
>>> 
>>>> Hi again!
>>>> 
>>>> I've been having this issue consistently since I first started this
>>> thread,
>>>> but it was happening infrequently enough for me to brush it aside and
>>> just
>>>> run an election to rebalance brokers.
>>>> 
>>>> I recently expanded (and reinstalled) our Kafka cluster so that it now
>>> has
>>>> 4 brokers with a default replication factor of 3 for each partition.  I
>>>> also switched over to the G1GC as recommended here:
>>>> https://kafka.apache.org/081/ops.html (even though we are still
>> running
>>>> Kafka 0.8.0; we hope to upgrade soon).
>>>> 
>>>> Now, only one of the 4 brokers (analytics1021, the same problem broker
>> we
>>>> saw before) gets its ZK connection expired even more frequently.
>>>> Previously it was less than once a week, now I am seeing this happen
>>>> multiple times a day.
>>>> 
>>>> I've posted all the relevant logs from a recent event here:
>>>> https://gist.github.com/ottomata/e42480446c627ea0af22
>>>> 
>>>> This includes the GC log on the offending Kafka broker during the time
>>> this
>>>> happened.  I am pretty green when it comes to GC tuning, but I do see
>>> this
>>>> interesting stat:
>>>> 
>>>> 
>>>> [Times: user=0.14 sys=0.00, real=11.47 secs]
>>>> 
>>>> Did Kafka's JVM really just take 11.47 secs to do a GC there?  I'm
>>>> probably missing something, but I don't see which part of that real
>>>> time summary makes up the bulk of that GC time
>>>> 
>>>> This is strange, riight?  This broker is identically configured to all
>>>> its peers, and should be handling on average the exact same amount and
>>>> type of traffic.  Anyone have any advice?
>>>> 
>>>> Thanks!
>>>> -Andrew Otto
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Mar 21, 2014 at 6:48 PM, Neha Narkhede <
>> neha.narkhede@gmail.com>
>>>> wrote:
>>>> 
>>>>> I see, that makes sense. Thanks a lot for clarifying!
>>>>> 
>>>>> -Neha
>>>>> 
>>>>> 
>>>>> On Fri, Mar 21, 2014 at 11:01 AM, Bae, Jae Hyeon <metacret@gmail.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> Let me clarify the situation. I forgot to mention that my case
>> might
>>>> not
>>>>> be
>>>>>> general one because Netflix is using Apache Curator as the main
>>>> zookeeper
>>>>>> client and ZkClient in Kafka should be bridged to Apache Curator,
>> so
>>>> the
>>>>>> behavior I have seen might not be general one.
>>>>>> 
>>>>>> Kafka's ZKSessionExpireListener.handleNewSession() is reinstating
>> all
>>>>>> ephemeral nodes and watchers but handleNewSession() was not kicked
>> in
>>>> my
>>>>>> case. So, I created Netflix internal version of ZkClient to replace
>>>>>> ephemeral node creation and watcher reinstating.
>>>>>> 
>>>>>> I have a plan to remove all external dependency from Kafka soon
>> and I
>>>>> might
>>>>>> be able to mention more about Kafka's zk resiliency.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Mar 21, 2014 at 7:51 AM, Jun Rao <junrao@gmail.com>
wrote:
>>>>>> 
>>>>>>> Hi, Jae,
>>>>>>> 
>>>>>>> My understanding is that ephemeral nodes can only be lost on
ZK
>>>> session
>>>>>>> expiration. Are you saying that there are other cases when
>>> ephemeral
>>>>>> nodes
>>>>>>> can be lost?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Jun
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Mar 20, 2014 at 9:52 PM, Bae, Jae Hyeon <
>>> metacret@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> This issue is zookeeper resiliency.
>>>>>>>> 
>>>>>>>> What I have done is, ephemeral node creation is replaced
by
>>> Apache
>>>>>>>> Curator's PersistentEphemeralNode recipe, to reinstate
>> ephemeral
>>>>> nodes
>>>>>>>> after zookeeper blip. Also, all watchers also should be
>>> reinstated.
>>>>>> Kafka
>>>>>>>> internally only handles session expired event but there are
>> many
>>>> bad
>>>>>>>> situations of zookeepers where Kafka can go haywire.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Mar 20, 2014 at 9:23 PM, Neha Narkhede <
>>>>>> neha.narkhede@gmail.com
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 1.  What does 'zookeeper state changed (Expired)' mean?
>>>>>>>>> 
>>>>>>>>> 2.  Has anyone seen issues like this before?  Where zookeeper
>>>>>>> connections
>>>>>>>>> are flaky enough to cause leader elections?
>>>>>>>>> 
>>>>>>>>> It means zookeeper expired the session. The most common
>> reason
>>>> for
>>>>>> this
>>>>>>>> is
>>>>>>>>> client side GC (in your case, client is the Kafka broker)
>>> though
>>>> if
>>>>>>> your
>>>>>>>>> zookeeper setup is flaky, this might even point to a
server
>>> side
>>>>> I/O
>>>>>>>>> bottleneck or GC. Could you inspect the broker and
>> zookeeper's
>>> gc
>>>>>> logs?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Thu, Mar 20, 2014 at 3:26 PM, Andrew Otto <
>>> otto@wikimedia.org
>>>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hm, just saw something a little fishy.
>>>>>>>>>> 
>>>>>>>>>> (In the following logs, analytics1021 (id 21) and
>>> analytics1022
>>>>> (id
>>>>>>> 22)
>>>>>>>>>> are Brokers and analytics1023,analytics1024,analytics1025
>> are
>>>>>>>>> Zookeepers.)
>>>>>>>>>> 
>>>>>>>>>> At 2014-03-20 21:12:26, analytics1021 lost its connection
>> to
>>>>>>> zookeeper.
>>>>>>>>> It
>>>>>>>>>> reconnected to analytics1023, but as it was doing
so, it
>>>> logged a
>>>>>>>>>> 'zookeeper state changed (Expired)' message, which
>> apparently
>>>>>> caused
>>>>>>>> the
>>>>>>>>>> newly opened socket to close again.  Am I reading
that
>> right?
>>>>>>>>>> 
>>>>>>>>>> In the time that it took analytics1021 to finally
>>> re-establish
>>>> a
>>>>>>>>> zookeeper
>>>>>>>>>> connection, analytics1022 had become the leader for
all
>>>>> partitions.
>>>>>>> I
>>>>>>>>> can
>>>>>>>>>> rebalance the leaders, but I'd prefer if this didn't
happen
>>> in
>>>>> the
>>>>>>>> first
>>>>>>>>>> place.
>>>>>>>>>> 
>>>>>>>>>> 1.  What does 'zookeeper state changed (Expired)'
mean?
>>>>>>>>>> 2.  Has anyone seen issues like this before?  Where
>> zookeeper
>>>>>>>> connections
>>>>>>>>>> are flaky enough to cause leader elections?
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> -Andrew Otto
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> [2014-03-20 21:12:26,427] 685698013
>>>>>>>>>> [main-SendThread(analytics1025.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Client session
timed out,
>>>> have
>>>>>> not
>>>>>>>>> heard
>>>>>>>>>> from server in 10067ms for sessionid 0x1425d903a3f1f89,
>>> closing
>>>>>>> socket
>>>>>>>>>> connection and attempting reconnect
>>>>>>>>>> [2014-03-20 21:12:26,528] 685698114 [main-EventThread]
INFO
>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed
>>>>>> (Disconnected)
>>>>>>>>>> [2014-03-20 21:12:27,252] 685698838
>>>>>>>>>> [main-SendThread(analytics1025.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket
connection
>>> to
>>>>>> server
>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181
>>>>>>>>>> [2014-03-20 21:12:27,271] 685698857
>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection
>>> established
>>>>> to
>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating
>>> session
>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859 [main-EventThread]
INFO
>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed
>>>> (Expired)
>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859
>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Unable to reconnect
to
>>>>> ZooKeeper
>>>>>>>>> service,
>>>>>>>>>> session 0x1425d903a3f1f89 has expired, closing socket
>>>> connection
>>>>>>>>>> [2014-03-20 21:12:27,273] 685698859 [main-EventThread]
INFO
>>>>>>>>>> org.apache.zookeeper.ZooKeeper - Initiating client
>>> connection,
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> connectString=analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad
>>>>>>>>>> sessionTimeout=6000
>>>> watcher=org.I0Itec.zkclient.ZkClient@63203b59
>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - re-registering
broker
>> info
>>>> in
>>>>> ZK
>>>>>>> for
>>>>>>>>>> broker 21
>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876 [main-EventThread]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - EventThread shut
down
>>>>>>>>>> [2014-03-20 21:12:27,290] 685698876 [main-SendThread()]
>> INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Opening socket
connection
>>> to
>>>>>> server
>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181
>>>>>>>>>> [2014-03-20 21:12:27,292] 685698878
>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Socket connection
>>> established
>>>>> to
>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, initiating
>>> session
>>>>>>>>>> [2014-03-20 21:12:27,313] 685698899
>>>>>>>>>> [main-SendThread(analytics1023.eqiad.wmnet:2181)]
INFO
>>>>>>>>>> org.apache.zookeeper.ClientCnxn - Session establishment
>>>> complete
>>>>> on
>>>>>>>>> server
>>>>>>>>>> analytics1023.eqiad.wmnet/10.64.5.15:2181, sessionid
=
>>>>>>>>>> 0xff429b4201dd2352, negotiated timeout = 6000
>>>>>>>>>> [2014-03-20 21:12:27,314] 685698900 [main-EventThread]
INFO
>>>>>>>>>> org.I0Itec.zkclient.ZkClient - zookeeper state changed
>>>>>>> (SyncConnected)
>>>>>>>>>> [2014-03-20 21:12:27,324] 685698910
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - done re-registering
>> broker
>>>>>>>>>> [2014-03-20 21:12:27,324] 685698910
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
>>>>>>>>>> INFO kafka.server.KafkaZooKeeper - Subscribing to
>>>> /brokers/topics
>>>>>>> path
>>>>>>>> to
>>>>>>>>>> watch for new topics
>>>>>>>>>> [2014-03-20 21:12:27,370] 685698956
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [ZkClient-EventThread-24-analytics1023.eqiad.wmnet,analytics1024.eqiad.wmnet,analytics1025.eqiad.wmnet/kafka/eqiad]
>>>>>>>>>> INFO
>>> kafka.server.ZookeeperLeaderElector$LeaderChangeListener -
>>>>> New
>>>>>>>>> leader
>>>>>>>>>> is 22
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message