kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Json Tu <kafka...@126.com>
Subject KAFKA-4360 issue
Date Tue, 01 Nov 2016 03:21:32 GMT
Hi,
	Can someone discuss it in KAFKA-4360, thanks.

> 在 2016年11月1日,上午10:54,huxi (JIRA) <jira@apache.org> 写道:
> 
> 
>   [ https://issues.apache.org/jira/browse/KAFKA-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624155#comment-15624155
] 
> 
> huxi commented on KAFKA-4360:
> -----------------------------
> 
> Excellent analysis! What I am intrigued is whether this is a deadlock issue or a liveness
issue. Here is my analysis:
> 1. Say at time T1, the zookeeper session expires, so 'handleNewSession' methods for SessionExpirationListener
is executed, therefore, obtaining the controller lock(controllerContext.controllerLock)
> 2. Then it invokes 'onControllerResignation' method to have the current controller quit,
which will shutdown leader rebalance scheduler by calling KafkaScheduler.shutdown
> 3. In 'shutdown' method, it shuts down the ScheduledThreadPoolExecutor and blocks until
all tasks have completed execution after a shutdown request
> 4. If there exists any tasks submitted before calling shutdown, the check-imbalance thread
should get started with checking isActive which acquires the controller lock at the very beginning
and then soon be blocked due to the lock has already been held by the main thread.
> 5. In that case, the main thread will block in onControllerResignation method until one
day has elapsed by default or you just interrupt the check thread.
> 
> Does it make sense?
> 
> 
>> Controller may deadLock when autoLeaderRebalance encounter zk expired
>> ---------------------------------------------------------------------
>> 
>>               Key: KAFKA-4360
>>               URL: https://issues.apache.org/jira/browse/KAFKA-4360
>>           Project: Kafka
>>        Issue Type: Bug
>>        Components: controller
>>  Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
>>          Reporter: Json Tu
>>            Labels: bugfix
>>       Attachments: yf-mafka2-common02_jstack.txt
>> 
>> Original Estimate: 168h
>> Remaining Estimate: 168h
>> 
>> when controller has checkAndTriggerPartitionRebalance task in autoRebalanceScheduler,and
then zk expired at that time. It will
>> run into deadlock.
>> we can restore the scene as below,when zk session expired,zk thread will call
handleNewSession which defined in SessionExpirationListener, and it will get controllerContext.controllerLock,and
then it will autoRebalanceScheduler.shutdown(),which need complete all the task in the autoRebalanceScheduler,but
that threadPoll also need get controllerContext.controllerLock,but it has already owned
by zk callback thread,which will then run into deadlock.
>> because of that,it will cause two problems at least, first is the broker’s id
is cannot register to the zookeeper,and it will be considered as dead by new controller,second
this procedure can not be stop by kafka-server-stop.sh, because shutdown function
>> can not get controllerContext.controllerLock also, we cannot shutdown kafka except
using kill -9.
>> In my attachment, I upload a jstack file, which was created when my kafka procedure
cannot shutdown by kafka-server-stop.sh.
>> I have met this scenes for several times,I think this may be a bug that not solved
in kafka.
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)



Mime
View raw message