kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoran <zoran.ljubi...@bulbtech.com>
Subject Re: Group consumer cannot consume messages if kafka service on specific node in test cluster is down
Date Tue, 30 Jan 2018 14:02:48 GMT
Sorry, I have attached wrong server.properties file. Now the right one 
is in the attachment.

Regards.


On 01/30/2018 02:59 PM, Zoran wrote:
> Hi,
>
> I have three servers:
>
> blade1 (192.168.112.31),
> blade2 (192.168.112.32) and
> blade3 (192.168.112.33).
>
> On each of servers kafka_2.11-1.0.0 is installed.
> On blade3 (192.168.112.33:2181) zookeeper is installed as well.
>
> I have created a topic repl3part5 with the following line:
>
> bin/kafka-topics.sh --zookeeper 192.168.112.33:2181 --create 
> --replication-factor 3 --partitions 5 --topic repl3part5
>
> When I describe the topic, it looks like this:
>
> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 
> --zookeeper 192.168.112.33:2181
>
> Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
>     Topic: repl3part5    Partition: 0    Leader: 2    Replicas: 
> 2,3,1    Isr: 2,3,1
>     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 
> 3,1,2    Isr: 3,1,2
>     Topic: repl3part5    Partition: 2    Leader: 1    Replicas: 
> 1,2,3    Isr: 1,2,3
>     Topic: repl3part5    Partition: 3    Leader: 2    Replicas: 
> 2,1,3    Isr: 2,1,3
>     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 
> 3,2,1    Isr: 3,2,1
>
> I have a producer for this topic:
>
> bin/kafka-console-producer.sh --broker-list 
> 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic 
> repl3part5
>
> and single consumer:
>
> bin/kafka-console-consumer.sh --bootstrap-server 
> 192.168.112.31:9092,192.168.112.32:9092,192.168.112.33:9092 --topic 
> repl3part5  --consumer-property group.id=zoran_1
>
> Every message that is sent by producer gets collected by consumer. So 
> far - so good.
>
> Now I would like to test fail over of the kafka servers. If I put down 
> blade 3 kafka service, I get consumer warnings but all produced 
> messages are still consumed.
>
> [2018-01-30 14:30:01,203] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 3 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:30:01,299] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 3 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:30:01,475] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 3 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
>
> Now I have started up kafka service on blade 3 and I have put down 
> kafka service on blade 2 server.
> Consumer now showed one warning but all produced messages are still 
> consumed.
>
> [2018-01-30 14:31:38,164] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
>
> Now I have started up kafka service on blade 2 and I have put down 
> kafka service on blade 1 server.
>
> Consumer now shows warnings about node 1/2147483646, but also 
> Asynchronous auto-commit of offsets ... failed: Offset commit failed 
> with a retriable exception. You should retry committing offsets. The 
> underlying error was: null.
>
> [2018-01-30 14:33:16,393] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:16,469] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:16,557] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:16,986] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:16,991] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:17,493] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:17,495] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:18,002] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:18,003] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Asynchronous auto-commit of offsets 
> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, 
> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, 
> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. The underlying error was: null 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:33:18,611] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:18,932] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:18,933] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Asynchronous auto-commit of offsets 
> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, 
> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, 
> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. The underlying error was: null 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:33:19,977] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 2147483646 could not be 
> established. Broker may not be available. 
> (org.apache.kafka.clients.NetworkClient)
> [2018-01-30 14:33:19,978] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Asynchronous auto-commit of offsets 
> {repl3part5-4=OffsetAndMetadata{offset=18, metadata=''}, 
> repl3part5-3=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-2=OffsetAndMetadata{offset=19, metadata=''}, 
> repl3part5-1=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-0=OffsetAndMetadata{offset=20, metadata=''}} failed: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. The underlying error was: null 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:33:19,979] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Connection to node 1 could not be established. Broker 
> may not be available. (org.apache.kafka.clients.NetworkClient)
>
> I tried to solve the problem by adding a 
> offsets.topic.replication.factor=2 (or 3) on all three 
> server.properties file (one of them is attached), but with no success.
> My idea was that topic __consumer_offset wasn't replicated throughout 
> the cluster, but looks like it is not the case here.
>
> While blade 1 kafka service was down topic describe showed the following:
>
> [root@blade1 kafka]# bin/kafka-topics.sh --describe --topic repl3part5 
> --zookeeper 192.168.112.33:2181
>
> Topic:repl3part5    PartitionCount:5    ReplicationFactor:3 Configs:
>     Topic: repl3part5    Partition: 0    Leader: 3    Replicas: 
> 2,3,1    Isr: 3
>     Topic: repl3part5    Partition: 1    Leader: 3    Replicas: 
> 3,1,2    Isr: 3
>     Topic: repl3part5    Partition: 2    Leader: 3    Replicas: 
> 1,2,3    Isr: 3
>     Topic: repl3part5    Partition: 3    Leader: 3    Replicas: 
> 2,1,3    Isr: 3
>     Topic: repl3part5    Partition: 4    Leader: 3    Replicas: 
> 3,2,1    Isr: 3
>
> Producer now shows the following warning, it still puts messages on 
> the topic but messages are just raising lag count on partitions:
>
> [2018-01-30 14:37:21,816] WARN [Producer clientId=console-producer] 
> Connection to node 1 could not be established. Broker may not be 
> available. (org.apache.kafka.clients.NetworkClient)
>
> I noticed that while kafka service on blade1 is alive, I can put 
> down/up blade 2 and 3 in any combination and consumer will always be 
> able to consume messages.
> If kafka service on blade 1 is down, than even if kafka services on 
> blade 2 and blade 3 are up and running, consumer cannot consume messages.
>
> After bringing kafka service up on blade 1, all messages that producer 
> has sent while kafka service on blade 1 was down are replayed and than 
> the following is showed in consumer terminal:
>
> [2018-01-30 14:44:30,817] ERROR [Consumer clientId=consumer-1, 
> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at 
> offset 20: This is not the correct coordinator. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:44:30,817] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Asynchronous auto-commit of offsets 
> {repl3part5-4=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-3=OffsetAndMetadata{offset=22, metadata=''}, 
> repl3part5-2=OffsetAndMetadata{offset=20, metadata=''}, 
> repl3part5-1=OffsetAndMetadata{offset=22, metadata=''}, 
> repl3part5-0=OffsetAndMetadata{offset=22, metadata=''}} failed: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. The underlying error was: This is not the correct 
> coordinator. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:44:31,202] ERROR [Consumer clientId=consumer-1, 
> groupId=zoran_1] Offset commit failed on partition repl3part5-4 at 
> offset 22: This is not the correct coordinator. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
> [2018-01-30 14:44:31,202] WARN [Consumer clientId=consumer-1, 
> groupId=zoran_1] Asynchronous auto-commit of offsets 
> {repl3part5-4=OffsetAndMetadata{offset=22, metadata=''}, 
> repl3part5-3=OffsetAndMetadata{offset=24, metadata=''}, 
> repl3part5-2=OffsetAndMetadata{offset=22, metadata=''}, 
> repl3part5-1=OffsetAndMetadata{offset=24, metadata=''}, 
> repl3part5-0=OffsetAndMetadata{offset=24, metadata=''}} failed: Offset 
> commit failed with a retriable exception. You should retry committing 
> offsets. The underlying error was: This is not the correct 
> coordinator. 
> (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
>
> From now on everything works with no problems or warnings and the 
> system is fully functional.
>
> Can someone explain to me why kafka server on blade 1 is so important, 
> and what are my options in order to be able to stop any of the two 
> servers (including kafka server on blade 1) and be able to consume 
> messages with no delay?
> This thing drives me crazy. :)
>
> Can you please help?
>
> Regards.


Mime
View raw message