flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-12294) kafka consumer, data locality
Date Tue, 23 Apr 2019 06:01:00 GMT
Sergey created FLINK-12294:

             Summary: kafka consumer, data locality
                 Key: FLINK-12294
                 URL: https://issues.apache.org/jira/browse/FLINK-12294
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Kafka, Runtime / Coordination
            Reporter: Sergey

Additional flag (with default false value) controlling whether topic partitions already grouped
by the key. Exclude unnecessary shuffle/resorting operation when this parameter set to true.
As an example, say we have client's payment transaction in a kafka topic. We grouping by clientId
(transaction with the same clientId goes to one kafka topic partition) and the task is to
find max transaction per client in sliding windows. In terms of map\reduce there is no needs
to shuffle data between all topic consumers, may be it`s worth to do within each consumer
to gain some speedup due to increasing number of executors within each partition data. With
N messages (in partition) instead of N*ln(N) (current realization with shuffle/resorting)
it will be just N operations. For windows with thousands events - the tenfold gain of execution

This message was sent by Atlassian JIRA

View raw message