kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brown <...@metamx.com>
Subject Race when broker reconnects to zk?
Date Fri, 23 Dec 2011 20:48:02 GMT
Hi all, I'm concerned that there's an unsafe race when a broker loses
and reestablishes its zk connection, and I'd like others to weigh in.

On ZookeeperConsumerConnector:204, registerConsumerInZK calls
ZkUtils.createEphemeralPathExpectConflict, which on ZkUtils:89 has a
case where it observes that the node and data it wants to create
already exist, and it considers this a success and returns normally.
But isn't it possible for that already created node to be a stale
ephemeral node that is about to disappear, in which case the broker
will lose its ephemeral /brokers/ids node and consumers won't be able
to find it? In particular, wouldn't this occur when the broker gets
disconnected from zk, reconnects with a new session, and tries to
recreate its ephemeral node before zk has timed out the ephemeral node
from its previous session?

I'm seeing a behavior where one of our brokers was running but had no
/brokers/ids node, and the logs indicated that it reconnected to zk
recently, and I'm suspecting this as the explanation. (To fix it, I
just restarted the broker.) I'm running an old RC for kafka-0.6, but
looking at the latest code (from the git mirror) it looks like the
code path described above is still the same as what we're running.

 Dan

Mime
View raw message