kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From João Peixoto <joao.harti...@gmail.com>
Subject What purpose serves the repartition topic?
Date Tue, 16 May 2017 23:44:05 GMT
Certain operations require a repartition topic, such as "selectKey" or
"map". What purpose serves this repartition topic?

Sample record: {"key": "a", ...}

Stream: source.selectKey((k, v) -> KeyValue.pair(k.toUpperCase(),
v)).groupByKey() //...

>From my understanding, the repartition topic will guarantee that if we are
reading from partition N, the new key will be written to the same partition
N on the repartition topic, which allows the stream task to always handle
the same partition number all the way.

This seems relevant if the topology above is followed by:
/*...*/.toStream().leftJoin(kTable) //...
We are still processing the same partition number. If the source stream and
the kTable are co-partitioned, so will be the repartition topic.

However in cases where there are no other operations in the topology like
"joins", that repartition topic seems useless.

There's a thread on this subject
<http://mail-archives.apache.org/mod_mbox/kafka-users/201705.mbox/%3CCAJikTEUHR=r0ika6vLF_y+QaJXg8f_Q19og_-s+Q-gozPqBgEw@mail.gmail.com%3E>,
specific to topics with one partition only. The argument there is that
repartition does not make sense on a topic with 1 partition only. However,
even if you have multiple partitions but never join with anything else, it
may not make sense for the reasons above.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message