kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiangjie Qin <j...@linkedin.com.INVALID>
Subject Re: Got conflicted ephemeral node exception for several hours
Date Mon, 13 Jul 2015 03:21:02 GMT
Hi Tao,

We see this error from time to time but did not think of this as a big
issue. Any reason it bothers you much?
I¹m not sure if throwing exception to user on this exception is a good
handling or not. What are user supposed to do in that case other than
retry?

Thanks,

Jiangjie (Becket) Qin

On 7/12/15, 7:16 PM, "tao xiao" <xiaotao183@gmail.com> wrote:

>We saw the error again in our cluster.  Anyone has the same issue before?
>
>On Fri, 10 Jul 2015 at 13:26 tao xiao <xiaotao183@gmail.com> wrote:
>
>> Bump the thread. Any help would be appreciated.
>>
>> On Wed, 8 Jul 2015 at 20:09 tao xiao <xiaotao183@gmail.com> wrote:
>>
>>> Additional info
>>> Kafka version: 0.8.2.1
>>> zookeeper: 3.4.6
>>>
>>> On Wed, 8 Jul 2015 at 20:07 tao xiao <xiaotao183@gmail.com> wrote:
>>>
>>>> Hi team,
>>>>
>>>> I have 10 high level consumers connecting to Kafka and one of them
>>>>kept
>>>> complaining "conflicted ephemeral node" for about 8 hours. The log was
>>>> filled with below exception
>>>>
>>>> [2015-07-07 14:03:51,615] INFO conflict in
>>>> /consumers/group/ids/test-1435856975563-9a9fdc6c data:
>>>> 
>>>>{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timest
>>>>amp":"1436275631510"}
>>>> stored data:
>>>> 
>>>>{"version":1,"subscription":{"test.*":1},"pattern":"white_list","timest
>>>>amp":"1436275558570"}
>>>> (kafka.utils.ZkUtils$)
>>>> [2015-07-07 14:03:51,616] INFO I wrote this conflicted ephemeral node
>>>> 
>>>>[{"version":1,"subscription":{"test.*":1},"pattern":"white_list","times
>>>>tamp":"1436275631510"}]
>>>> at /consumers/group/ids/test-1435856975563-9a9fdc6c a while back in a
>>>> different session, hence I will backoff for this node to be deleted by
>>>> Zookeeper and retry (kafka.utils.ZkUtils$)
>>>>
>>>> In the meantime zookeeper reported below exception for the same time
>>>>span
>>>>
>>>> 2015-07-07 22:45:09,687 [myid:3] - INFO  [ProcessThread(sid:3
>>>> cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException
>>>> when processing sessionid:0x44e657ff19c0019 type:create cxid:0x7a26
>>>> zxid:0x3015f6e77 txntype:-1 reqpath:n/a Error
>>>> Path:/consumers/group/ids/test-1435856975563-9a9fdc6c
>>>>Error:KeeperErrorCode
>>>> = NodeExists for /consumers/group/ids/test-1435856975563-9a9fdc6c
>>>>
>>>> At the end zookeeper timed out the session and consumers triggered
>>>> rebalance.
>>>>
>>>> I know that conflicted ephemeral node warning is to handle a zookeeper
>>>> bug that session expiration and ephemeral node deletion are not done
>>>> atomically but as indicated from zookeeper log the zookeeper never
>>>>got a
>>>> chance to delete the ephemeral node which made me think that the
>>>>session
>>>> was not expired at that time. And for some reason zookeeper fired
>>>>session
>>>> expire event which subsequently invoked ZKSessionExpireListener.  I
>>>>was
>>>> just wondering if anyone have ever encountered similar issue before
>>>>and
>>>> what I can do at zookeeper side to prevent this?
>>>>
>>>> Another problem is that createEphemeralPathExpectConflictHandleZKBug
>>>> call is wrapped in a while(true) loop which runs forever until the
>>>> ephemeral node is created. Would it be better that we can employ an
>>>> exponential retry policy with a max number of retries so that it has a
>>>> chance to re-throw the exception back to caller and let caller handle
>>>>it in
>>>> situation like above?
>>>>
>>>>


Mime
View raw message