kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit K <amitk....@gmail.com>
Subject Kafka java consumer processes duplicate messages
Date Mon, 01 Aug 2016 17:11:19 GMT
Hi,

I am kind of new to Kafka. I have set up a 3 node kafka (1 broker per
machine) cluster with 3 node zookeer cluster. I am using Kafka 0.9.0.0
version.

The set up works fine wherein from my single producer I am pushing a JSON
string to Kafka to a topic with 3 partitions and replication factor of 2.
At consumer end I have application with 3 consumer threads (I suppose each
consumer thread will read from corresponding dedicated partition). The
consumer reads the JSON and persist the same in DB in a separate thread.
Following are consumer properties:

topic=TestTopic2807
bootstrap.servers=XXX.221:9092,XXX.222:9092,XXX.221:9092
topic.consumer.threads=3
group.id=EOTG
client.id=EOTG
enable.auto.commit=true
auto.commit.interval.ms=10000
session.timeout.ms=30000
key.deserializer=org.apache.kafka.common.serialization.StringDeserializer
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer


The consumer thread routine is as follows: Each consumer runs following in
it's own thread and spawns a new thread for DB operation (I know DB
operation failure can be issue but will fix that sooner)

 ConsumerRecords<String, String> records = consumer.poll(20);
               if(!records.isEmpty()) {
                   for (ConsumerRecord<String, String> record : records) {

                    String eOCJSONString = record.value();

                    logger.info("Received the records at consumer id:" +
consumerId +
                    ". Record topic:" + record.topic() +
                    ". Record partition:" + record.partition() +
                    ". Record offset id:" + record.offset());
                    logger.info("\n Record:" + eOCJSONString);

                    if (emailOCJSONString.startsWith("{")) {
                    OCBean ocBean = gson.fromJson(record.value(),
EOCBean.class);
                    executorServiceWorker.submit(new OCWorker(ocBean,
consumerId));
                    :
}

The problem occurs when I load test the application sending 30k of messages
(JSONS) from single producer and when I tried bringing down one of the
broker while consumer is consuming the messages. I could observe that many
of the messages are processed duplicate (~200-800). I repeated this
experiment a few times and always noticed that there are many messages
which are read duplicate by consumer thread. I tried by bringing one, two
brokers down.

Is it normal to happen?
Should I switch to manual offset commit than enabling auto commit?
Or should I manually assign the partition in program rather than let
brokers manage it?

Am I missing something very important here?

Also,
I observed that Kafka-Python had similar bug and has been fixed it in 0.9.2
(https://github.com/dpkp/kafka-python/issues/189), but I believe no such
issue reported for Java.

Thanks,

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message