kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <matth...@confluent.io>
Subject Re: Kafka Streams - offset number affected by "exactly once processing guarantee"
Date Fri, 21 Jun 2019 00:29:39 GMT
The observed behavior is expected.

> For example, if we send 2615 events to an empty topic, we expect the end of
>> the topic to be offset 2616.

This is a wrong expectation. Even if Kafka behaves that way for non-EOS,
there is no "contract" that guarantees that offsets are consecutive.
Kafka only guarantees that offsets are ascending.

In fact, if you enable log compaction, there are also "missing offsets"
after a topic was compacted.

For EOS, each time a transaction is committed, a commit marker is
written into the topic, and the commit marked "eats" one offset.
(Similarly, if a transaction is aborted writing an abort marker).

>> In addition, there are "missing offsets" in the topic. There seems to be
>> nothing inside message offset 5, but it still counts as a message...I used
>> Kafkacat to print out that message, and I have specified  -Z (Print NULL
>> messages and keys as "NULL"(instead of empty)), but it didn't print out

Those markers are internal and related to your "missing offsets". You
cannot read the markers as clients will skip over them on read.

There is also a related issue with regard to transnational markers:
https://issues.apache.org/jira/browse/KAFKA-6607 Lag will never be
reported as zero, because of the tailing transactional markers.



The number of stored messages in a partition are not exposed in Kafka
and I am not aware of a way to get this information.



-Matthias


On 6/20/19 2:42 PM, Trinity Xia wrote:
> Hi,
> 
> We are using Kafka Streams 2.2.1 and Kafka 2.2.0, and we noticed that the
> end offset number is larger than the numbers of events sent to a topic if
> we set *processing guarantee* as *exactly once* in a Kafka Streams app.
> 
> For example, if we send 2615 events to an empty topic, we expect the end of
> the topic to be offset 2616. However,  if we set *processing
> guarantee* as *exactly
> once,* the end of the topic is offset 2619 instead.
> 
> Key: 5-****5 offset: 0
> Key: 5-****-5 offset: 1
> Key: 5-****-5 offset: 2
> Key: 5-****-5 offset: 3
> Key: 5-****-5 offset: 4
> Key: 25-****-f offset: 6
> Key: 25-****-f offset: 7
> Key: 25-****-f offset: 8
> .....
> 
> % Reached end of topic tracking_events[0] at offset 2619: exiting
> 
> In addition, there are "missing offsets" in the topic. There seems to be
> nothing inside message offset 5, but it still counts as a message...I used
> Kafkacat to print out that message, and I have specified  -Z (Print NULL
> messages and keys as "NULL"(instead of empty)), but it didn't print out
> anything either...
> 
> Has anyone encountered the same issue before?  Are there any other methods
> to calculate the number of messages inside a topic efficiently?  Thanks for
> the help :)
> 
> Trinity
> 


Mime
View raw message