kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Thacker <he...@bath.edu>
Subject Re: Consumer position not returning correct offset - 0.10.0.1
Date Mon, 10 Sep 2018 19:22:51 GMT
This was resolved internally - I realised that we had a second consumer running against the
topic which was committing offsets without setting a group.id or auto.commit to false.

Henry

> On 5 Sep 2018, at 08:28, Henry Thacker <henry@bath.edu> wrote:
> 
> Hi,
> 
> In testing - i’ve got a 5 node Kafka cluster with min.insync.replicas set to 4. The
cluster is currently running version 0.10.0.1. 
> 
> There’s an application that publishes to a topic - and on restart, it attempts to read
the full contents of the topic up until the high watermark before then publishing extra messages
to the topic. This particular topic routinely contains between 600,000 and 5 million messages
and consists of a single partition, as we rely on precise ordering. To determine the offset
for the high-water mark, we do something similar to the following:
> 
> // Do not use offset management
> props.put(“enable.auto.commit”, false);
> props.put(“auto.offset.reset”, “latest’);
> 
> // ...
> 
> consumer.assign(tpList);
> consumer.poll(100);
> 
> long maxOffset = consumer.position(tp) - 1;
> 
> This has been working fine for months and then all of a sudden, we found that maxOffset
was returning an offset that could be 10s or even 1000s of offsets away from the real high
watermark. As far as I can tell, the cluster state seems to be OK - there are no offline partitions
or out of sync replicas when we see this issue. We also are not using transactional messages
in Kafka. 
> 
> We found that doing an explicit seekToEnd before checking the position seems to help,
eg:
> 
> // ...
> 
> consumer.assign(tpList);
> consumer.poll(100);
> 
> consumer.seekToEnd(tpList);
> long maxOffset = consumer.position(tp) - 1;
> 
> But I can’t understand why this is necessary - when everything seems to have been working
prior to this? I’m now worried the cluster is in a bad state, and we’re not capturing
the health status correctly or missing some error messages somewhere. 
> 
> Keen for any ideas about what this might be or for things I can try.
> 
> Thanks in advance,
> Henry
> 
> 
> 
> 
> 
> 
> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message