kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Kolotyluk <e...@kolotyluk.net>
Subject Re: Kafka Operational Wierdness
Date Wed, 13 Sep 2017 15:21:46 GMT
Some more detail on this issue:

On a hunch I tried restarting my docker-compose stack a few more times. 
Still same problem, my application using the Kafka Client APIs claims it 
is talking to Kafka, but the Kafka logs disagree.

So I restarted the stack once more. With 'docker-compose up' this is a 
very clean start. Then I waited about 10 minutes until I saw

kafka-1_1    | [2017-09-13 15:00:47,214] INFO [Group Metadata Manager on 
Broker 1001]: Removed 0 expired offsets in 0 milliseconds. 
kafka-3_1    | [2017-09-13 15:00:47,595] INFO [Group Metadata Manager on 
Broker 1002]: Removed 0 expired offsets in 0 milliseconds. 
kafka-2_1    | [2017-09-13 15:00:47,771] INFO [Group Metadata Manager on 
Broker 1003]: Removed 0 expired offsets in 0 milliseconds. 

in the log. When I reran my application, suddenly the Kafka logs come 
alive with indications they are creating topics, et al.

I have been using Kafka for a couple months now, and this is very new 
behavior I have not seen until a week ago. I am used to being able to 
run my application immediately after the Kafka Stack comes up in Docker. 
Operationally now it seems I have to wait 10 minutes after starting Kafka.

Of course I am still dealing with the NotLeaderForPartitionException 
problem, which is also new, and breaks my application, but at least I 
seem to have a repeatable path to that problem.

Cheers, Eric

On 2017-09-12 2:43 PM, Eric Kolotyluk wrote:
> The last few days I have been seeing a problem I do not know how to 
> explain.
> For months I have been successfully running Kafka/Zookeeper under 
> docker, and my application seems to work fine. Lately, when I run 
> Kafka under either docker-compose on my developer system, or 'docker 
> stack deploy' on a Docker Swarm on AWS, here is what I am seeing:
> According to the logs, Zookeeper/Kafka seem to start okay, and the 3 
> brokers I have configured seem to find each other. The logs look 
> pretty normal. Then I start my application, and my application logs 
> show that it has connected to the Kafka Cluster okay, it indicates 
> that it has created the topics okay. However, there is nothing in the 
> Kafka logs to show any kind of connection from my application, let 
> along topics being created. Sure enough, when I rerun my application, 
> it cannot find the topics, it tries to create them again, and gets a 
> successful response from the Kafka Admin Client. Nope, they were not 
> created.
> When I shut down Kafka, the logs show the shutdown sequence for all 
> the brokers and zookeeper. I cannot understand why the Kafka Client 
> Library is not showing any errors when the Kafka logs are not showing 
> any connection or operations.
> I tried both Kafka and -- same problem.
> Been trying to figure out this problem all morning, bashing my head 
> against the wall.
> *Then I go to lunch*, and a couple hours later I try one more time. 
> Behold, suddenly I can see the Kafka logs reporting they have created 
> the topics my application requested. But now I am stuck with the 
> infamous org.apache.kafka.common.errors.NotLeaderForPartitionException 
> problem again. This is another new problem that has started recently. 
> Unfortunately I have wasted hours and hours fighting the first problem 
> I have not been able to dig into this one.
> What could possibly be the explanation for this not working, and then 
> working again after a few hours?
> It seems insanely difficult to operate a Kafka cluster in any kind of 
> stable configuration that does not fail randomly.
> Can anyone offer any kind of advice on what the problem might be?
> It it better to just give up trying to operate our own Kafka cluster 
> and use Kinesis instead?
> Cheers, Eric

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message