kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Taschuk <stasc...@stripe.com.INVALID>
Subject ISR briefly shrinks then expands
Date Thu, 02 May 2019 21:39:59 GMT
Our kafka broker logs show episodes where several partitions have
"Shrinking ISR ..." messages followed, usually less than 2 seconds
later, by corresponding "Expanding ISR ..." messages that restore the
original set of brokers for all the partitions.  Does anyone have any
suggestions on how to investigate this?

On our main cluster, consisting of 12 brokers running kafka 2.0.1,
with 2500-3000 partitions in 300-400 topics (each partition having 3
replicas), we see about 10 such episodes a day, each involving
typically 5-8 partitions.  The set of partitions varies but often
repeats or nearly repeats; to illustrate that, here's the partitions
affected in one broker's episodes over the past week:
    2019-04-27T17:51
status-4,dining-11,tax-0,education-0,government-7,locker-19,credit-11,law-15
    2019-04-27T18:01    family-14,health-13,golf-2,phone-14,news-15,peace-13
    2019-04-27T22:59
stock-13,income-2,insurance-0,college-0,district-1,breast-8,back-41
    2019-04-28T02:35    mg-12,executive-18,nursing-12
    2019-04-28T02:51    mg-12,executive-18,nursing-12
    2019-04-28T08:34
health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
    2019-04-28T12:15
health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
    2019-04-28T19:03
health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
    2019-04-28T20:16
health-25,living-10,supra-8,death-15,drug-3,talk-12,cell-0
    2019-04-29T18:16
climate-11,lemon-12,faculty-22,side-4,music-12,police-0,room-0,press-8,parking-3,subject-10,blood-3
    2019-04-30T22:54    living-10,death-15,drug-3,talk-12,cell-12
    2019-05-01T16:52
oil-4,community-5,ice-7,public-2,substance-0,grocery-9,carbon-27,g-12
    2019-05-01T17:01
child-16,community-5,ice-7,public-2,grocery-9,carbon-27,task-3,g-12
    2019-05-01T17:36    community-5,ice-7,public-2,grocery-9,carbon-27,g-12
    2019-05-01T22:14
 school-9,interest-2,kitchen-0,hotel-3,carbon-3,heart-14
(I've replaced the actual topic names with common words.)

We see similar behaviour on another cluster, consisting of 7 brokers
running kafka 2.2.0.

We haven't found anything unusual in the surrounding logs, or in
metrics about the network and disk activity of the brokers.

Some similar-looking issues from jira:
https://issues.apache.org/jira/browse/KAFKA-4003
    similar in that the expand happens within a second or two of the shrink
https://issues.apache.org/jira/browse/KAFKA-4674
https://issues.apache.org/jira/browse/KAFKA-3916
    both of these involve disconnections, which we don't see
https://issues.apache.org/jira/browse/KAFKA-7152
    talks about a constant churn of shrink/expand, which we don't see
(also, was fixed in 2.1.0, and we see it on 2.2.0)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message