flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tzu-Li (Gordon) Tai (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-4722) Consumer group concept not working properly with FlinkKafkaConsumer09
Date Tue, 04 Oct 2016 03:05:20 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15544170#comment-15544170
] 

Tzu-Li (Gordon) Tai edited comment on FLINK-4722 at 10/4/16 3:04 AM:
---------------------------------------------------------------------

One last comment:
Iterating over all data shouldn't be required. Say you have 3 parallelism (3 subtasks) for
the FlinkKafkaConsumer. Each subtask will be assigned a single partition and read only that
partition. Using a {{KeyedDeserializationSchema}}, you can key each data record with the partition
id "within the source". That means you won't need another map function to do this keying.
The data that comes out of FlinkKafkaConsumer will be (Tuple(partition, matrixData))

The only "iteration" would probably be a "keyBy" operation after the source to transform the
stream into a {{KeyedStream}} before windowing. I think, in use cases like yours, you would
benefit if the returned stream from the source is already a {{KeyedStream}}, to take advantage
of pre-partitioned data outside of Flink. I'll think a bit about this idea (would be great
if you can let me know what you think of this)!

In any way, writing your own custom consumer also works for now :)


was (Author: tzulitai):
One last comment:
Iterating over all data shouldn't be required. Say you have 3 parallelism (3 subtasks) for
the FlinkKafkaConsumer. Each subtask will be assigned a single partition and read only that
partition. Using a {{KeyedDeserializationSchema}}, you can key each data record with the partition
id "within the source". That means you won't need another map function to do this keying.
The data that comes out of FlinkKafkaConsumer will be (Tuple(partition, matrixData))

The only "iteration" would probably be a "keyBy" operation after the source to transform the
stream into a {{KeyedStream}} before windowing. I think, in use cases like yours, you would
benefit if the returned stream from the source is already a {{KeyedStream}}, to take advantage
of pre-partitioned data outside of Flink. I'll think a bit about this idea!

In any way, writing your own custom consumer also works for now :)

> Consumer group concept not working properly with FlinkKafkaConsumer09  
> -----------------------------------------------------------------------
>
>                 Key: FLINK-4722
>                 URL: https://issues.apache.org/jira/browse/FLINK-4722
>             Project: Flink
>          Issue Type: Bug
>          Components: Kafka Connector
>    Affects Versions: 1.1.2
>            Reporter: Sudhanshu Sekhar Lenka
>
> When Kafka one Topic has 3 partition and 3 FlinkKafkaConsumer09 connected to that same
topic using "group.id" ,"myGroup" property . Still flink consumer get all data which are push
to each 3   partition . While it work properly with normal java consumer. each consumer get
specific data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message