kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Kolotyluk <e...@kolotyluk.net>
Subject Kafka Operational Wierdness
Date Tue, 12 Sep 2017 21:43:50 GMT
The last few days I have been seeing a problem I do not know how to explain.

For months I have been successfully running Kafka/Zookeeper under 
docker, and my application seems to work fine. Lately, when I run Kafka 
under either docker-compose on my developer system, or 'docker stack 
deploy' on a Docker Swarm on AWS, here is what I am seeing:

According to the logs, Zookeeper/Kafka seem to start okay, and the 3 
brokers I have configured seem to find each other. The logs look pretty 
normal. Then I start my application, and my application logs show that 
it has connected to the Kafka Cluster okay, it indicates that it has 
created the topics okay. However, there is nothing in the Kafka logs to 
show any kind of connection from my application, let along topics being 
created. Sure enough, when I rerun my application, it cannot find the 
topics, it tries to create them again, and gets a successful response 
from the Kafka Admin Client. Nope, they were not created.

When I shut down Kafka, the logs show the shutdown sequence for all the 
brokers and zookeeper. I cannot understand why the Kafka Client Library 
is not showing any errors when the Kafka logs are not showing any 
connection or operations.

I tried both Kafka 0.11.0.0 and 0.10.2.1 -- same problem.

Been trying to figure out this problem all morning, bashing my head 
against the wall.

*Then I go to lunch*, and a couple hours later I try one more time. 
Behold, suddenly I can see the Kafka logs reporting they have created 
the topics my application requested. But now I am stuck with the 
infamous org.apache.kafka.common.errors.NotLeaderForPartitionException 
problem again. This is another new problem that has started recently. 
Unfortunately I have wasted hours and hours fighting the first problem I 
have not been able to dig into this one.

What could possibly be the explanation for this not working, and then 
working again after a few hours?

It seems insanely difficult to operate a Kafka cluster in any kind of 
stable configuration that does not fail randomly.

Can anyone offer any kind of advice on what the problem might be?

It it better to just give up trying to operate our own Kafka cluster and 
use Kinesis instead?

Cheers, Eric


Mime
View raw message