kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brown <...@metamx.com>
Subject Race when broker reconnects to zk?
Date Fri, 23 Dec 2011 20:48:02 GMT
Hi all, I'm concerned that there's an unsafe race when a broker loses
and reestablishes its zk connection, and I'd like others to weigh in.

On ZookeeperConsumerConnector:204, registerConsumerInZK calls
ZkUtils.createEphemeralPathExpectConflict, which on ZkUtils:89 has a
case where it observes that the node and data it wants to create
already exist, and it considers this a success and returns normally.
But isn't it possible for that already created node to be a stale
ephemeral node that is about to disappear, in which case the broker
will lose its ephemeral /brokers/ids node and consumers won't be able
to find it? In particular, wouldn't this occur when the broker gets
disconnected from zk, reconnects with a new session, and tries to
recreate its ephemeral node before zk has timed out the ephemeral node
from its previous session?

I'm seeing a behavior where one of our brokers was running but had no
/brokers/ids node, and the logs indicated that it reconnected to zk
recently, and I'm suspecting this as the explanation. (To fix it, I
just restarted the broker.) I'm running an old RC for kafka-0.6, but
looking at the latest code (from the git mirror) it looks like the
code path described above is still the same as what we're running.


View raw message