flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-4723) Unify definition of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer
Date Sat, 01 Oct 2016 10:50:20 GMT
Tzu-Li (Gordon) Tai created FLINK-4723:

             Summary: Unify definition of committed offsets to Kafka / ZK for Kafka 0.8 and
0.9 consumer
                 Key: FLINK-4723
                 URL: https://issues.apache.org/jira/browse/FLINK-4723
             Project: Flink
          Issue Type: Improvement
          Components: Kafka Connector
            Reporter: Tzu-Li (Gordon) Tai
            Assignee: Tzu-Li (Gordon) Tai
             Fix For: 1.2.0, 1.1.3

The proper "definition" of the offsets committed back to Kafka / ZK should be "the next offset
that consumers should read (in Kafka terms, the 'position')".

This is already fixed for the 0.9 consumer by FLINK-4618, by incrementing the committed offsets
back to Kafka by the 0.9 by 1, so that the internal {{KafkaConsumer}} picks up the correct
start position when committed offsets are present. This fix was required because the start
position was implicitly caught with Kafka 0.9 APIs

However, since the 0.8 consumer handles offset committing and start position using Flink's
own {{ZookeeperOffsetHandler}} and not Kafka's high-level APIs, so the 0.8 consumer did not
require a fix.

I propose to still unify the behaviour of committed offsets across 0.8 and 0.9 to the definition

Otherwise, if users in any case first uses the 0.8 consumer to read data and have Flink-committed
offsets in ZK, and then uses a high-level 0.8 Kafka consumer to read the same topic in a non-Flink
application, the first record will be duplicate (because, like described above, Kafka high-level
consumers expect the committed offsets to be "the next record to process" and not "the last
processed record").

This requires incrementing the committed ZK offsets in 0.8 to also be incremented by 1, and
changing how Flink internal offsets are set according to the acquired ZK offsets.

This message was sent by Atlassian JIRA

View raw message