kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andras Beni <andrasb...@cloudera.com>
Subject Re: ListOffsets parameters
Date Thu, 12 Apr 2018 20:19:36 GMT
Hi Emmett,

ListOffsets API tells you about the log segments belonging to (the given
partitions of) a topic.
I think I better explain how it behaves by example.
I have a topic called 'test2' with three partitions (0..2). I produced 2665
messages to its partition 0. I set up the topic so that it rolls a new
segment after around 530 messages.
My two oldest segments were deleted, so now in my kafka/data/test2-0
directory I have these files:
Let's say I call ListOffset API version 0 now with following parameters
Topic: test2, partition: 0, timestamp: -1 ( == latest), numberOfOffsets :
This will result in an offset list of [2665,2649,2119,1589,1059] One item
for each first offset in a file and the last offset (but not more than 100
If I set numberOfOffsets to 3, I only get 3 items: [2665,2649,2119]
Now let's say I use timestamp -2 ( == oldest) this way I only get back the
oldest offset [1059] regardless of numberOfOffsets.
Using a real timestamp (e.g. 1523559480000) will show me offsets that were
appended to the log before that time (based on last modified date of the
If the timestamp means a moment before the oldest message, the returned
array will be empty.

In version 1 and 2 request and response have changed:
If timestamp == -1, the latest offset is returned (except if isolation
level is read committed, because then the latest stable offset is returned)
if timestamp == -2, the oldest offset is returned
if timestamp is a real time value, the log is searched for the last offset
whose timestamp is smaller than given timestamp.

I hope this helps,

On Mon, Mar 26, 2018 at 6:33 PM, Emmett Butler <emmett@parsely.com> wrote:

> Hi users,
> I'm the maintainer of the PyKafka <https://github.com/parsely/pykafka>
> library and I'm working on improving its support for the ListOffsets API.
> Some questions:
> Kafka version: 1.0.0
> I'm using this documentation
> <https://kafka.apache.org/protocol#The_Messages_ListOffsets> for
> reference.
> In some cases, ListOffsets requests return an empty array of offsets. Is
> this expected behavior? If so, when?
> What format is the Timestamp parameter expected in? I've been using
> milliseconds since epoch (python: time.time() * 1000), but I haven't
> managed to get a response that isn't either [0] or [] using this approach.
> Could this have to do with the number of log segments on my topic, or the
> presence of a time index? How do I make a request including a Timestamp
> (not special values -1 or -2) that returns a valid offset? What is the
> meaning of a [0] response in this context?
> What is the MaxNumberOfOffsets parameter supposed to do? When I request max
> 10 offsets, I get back [12345, 0] (two offsets). Again, could this have to
> do with the number of log segments on my topic?
> Related PyKafka issue tickets for reference:
> https://github.com/Parsely/pykafka/issues/728
> https://github.com/Parsely/pykafka/issues/733
> Thanks for your help.
> --
> Emmett Butler | Software Engineer
> <http://www.parsely.com/?utm_source=Signature&utm_medium=
> emmett-butler&utm_campaign=Signature>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message