kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Miller <justin.mil...@protectwise.com>
Subject Re: Contiguous Offsets on non-compacted topics
Date Tue, 23 Jan 2018 21:35:00 GMT
Hi Matthias and Guozhang,

Given that information, I think I’m going to try out the following in our data lake persisters
(spark-streaming-kafka): 
https://issues.apache.org/jira/browse/SPARK-17147 <https://issues.apache.org/jira/browse/SPARK-17147>

Skipping one message out of 10+ billion a day won’t be the end of the world for this topic
and it’ll save me from having to manually restart the process. :)

These topics aren’t compacted, and we’re still only on 0.10 (switched to 0.10 today),
but we were able to reproduce the issue when we restarted the Kafka brokers migrating from
0.9.0.0 message format to 0.10.2.

Thanks,
Justin

> On Jan 23, 2018, at 2:31 PM, Guozhang Wang <wangguoz@gmail.com> wrote:
> 
> Hello Justin,
> 
> There are actually multi reasons that can cause incontinuous offsets, or
> "holes" in the Kafka partition logs:
> 
> 1. compaction, you knew it already.
> 2. when transactions are turned on, then some offsets are actually taken by
> the "transaction marker" messages, which will not be exposed by the
> consumer since they are only used internally. So from the reader's pov
> there are holes in the offsets.
> 
> 
> 
> Guozhang
> 
> 
> 
> 
> On Tue, Jan 23, 2018 at 9:52 AM, Justin Miller <
> justin.miller@protectwise.com> wrote:
> 
>> Greetings,
>> 
>> We’ve seen a strange situation where-in the topic is not compacted but the
>> offset numbers inside the partition (#93) are not contiguous. This only
>> happens once a day though, on a topic with billions of messages per day.
>> 
>> next offset = 1786997223
>> next offset = 1786997224
>> next offset = 1786997226
>> next offset = 1786997227
>> next offset = 1786997228
>> 
>> I was wondering if this still holds with Kafka 0.10, 0.11, 1.0:
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets <
>> http://grokbase.com/t/kafka/users/12bpnexg1m/dumb-question-about-offsets>
>> 
>> Specifically: “In Kafka 0.8, each message is assigned a monotonically
>> increasing, contiguous sequence number per partition,starting with 1.”
>> 
>> We’re on Kafka 1.0 with logs at version 0.9.0.0.
>> 
>> Thanks!
>> Justin
> 
> 
> 
> 
> -- 
> -- Guozhang


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message