flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-12294) Kafka connector, work with grouping partitions
Date Mon, 29 Apr 2019 12:08:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey updated FLINK-12294:
---------------------------
    Summary: Kafka connector, work with grouping partitions  (was: kafka consumer, data locality)

> Kafka connector, work with grouping partitions
> ----------------------------------------------
>
>                 Key: FLINK-12294
>                 URL: https://issues.apache.org/jira/browse/FLINK-12294
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / Kafka, Runtime / Coordination
>            Reporter: Sergey
>            Priority: Major
>              Labels: performance
>
> Additional flag (with default false value) controlling whether topic partitions already
grouped by the key. Exclude unnecessary shuffle/resorting operation when this parameter set
to true. As an example, say we have client's payment transaction in a kafka topic. We grouping
by clientId (transaction with the same clientId goes to one kafka topic partition) and the
task is to find max transaction per client in sliding windows. In terms of map\reduce there
is no needs to shuffle data between all topic consumers, may be it`s worth to do within each
consumer to gain some speedup due to increasing number of executors within each partition
data. With N messages (in partition) instead of N*ln(N) (current realization with shuffle/resorting)
it will be just N operations. For windows with thousands events - the tenfold gain of execution
speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message