kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabio Pardi <f.pa...@portavita.eu>
Subject Re: reliable way to count number of messages
Date Mon, 08 Jun 2020 13:18:53 GMT
Solved.

For the future us: the reason why offsets are 2 times the messages is to be found in how (our)
producer works.

The producer commits the message and the transaction, thus the offset is incremented by 2
for each sent message.

regards,

fabio pardi

On 08/06/2020 13:45, Fabio Pardi wrote:
> Hello Liam,
>
> thanks for your reply.
>
> I m still in the process of consolidating my Kafka knowledge so I might have overlooked
something in the current configuration or in the investigation of the current problem.
>
>
> About the problem, the strange thing is that the earliest offset is actually 0. My question
was triggered because i disabled log compaction passing 'log.cleaner.enable: "false"' to the
brokers. Sorry for not mentioning it before.
>
>
> kafka-run-class kafka.tools.GetOffsetShell --broker-list [...]:9092 --topic pgo.fhir3.resource
--time -2
> mytopic:0:0
>
> What sounds to me suspicious besides the offset and the number of messages not being
identical, is that the former is exactly 2 times the latter.
>
>
> regards,
>
> fabio pardi
>
>
>
>
> On 08/06/2020 12:26, Liam Clarke-Hutchinson wrote:
>> Hi Fabio,
>>
>> -1 is shorthand for latest when passed as --time to GetOffsetShell (-2 is
>> earliest), so the output is telling you that the latest offset of partition
>> 0 of the topic is 47252.
>>
>> However, the earliest offset in the topic may not be zero - as topic
>> retention times are hit and messages removed, offsets aren't changed.
>>
>> So likely you'll find the earliest offset is 23626 or similar if you run
>> GetOffsetShell with --time -2.
>>
>> Cheers,
>>
>> Liam Clarke-Hutchinson
>>
>> On Mon, 8 Jun. 2020, 8:42 pm Fabio Pardi, <f.pardi@portavita.eu> wrote:
>>
>>> Hi there,
>>>
>>> I have one topic with one partition and i want to know how many messages
>>> are there in the topic.
>>>
>>> I noticed that if i run:
>>>
>>> kafka-console-consumer --topic mytopic  --bootstrap-server [..]:9092
>>> --from-beginning
>>>
>>> [..]
>>> Processed a total of 23626 messages
>>>
>>>
>>> If I instead run:
>>>
>>>  kafka.tools.GetOffsetShell --broker-list [..]:9092 --topic mytopic --time
>>> -1
>>>
>>> mytopic:0:47252
>>>
>>>
>>> So the 2 commands return different numbers and the first returns exactly
>>> half the amount the second does.
>>>
>>> Why the 2 commands do not return the same amount and which one is right?
>>>
>>>
>>> kafka-console-consumer --version
>>> 5.4.1-ccs (Commit:fd1e543386b47352)
>>>
>>> kafka-run-class -version
>>> openjdk version "1.8.0_212"
>>> OpenJDK Runtime Environment (Zulu 8.38.0.13-CA-linux64) (build
>>> 1.8.0_212-b04)
>>> OpenJDK 64-Bit Server VM (Zulu 8.38.0.13-CA-linux64) (build 25.212-b04,
>>> mixed mode)
>>>
>>> regards,
>>>
>>> fabio pardi
>>>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message