kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Kaplinskiy <m...@ladderlife.com.INVALID>
Subject Parallel computation of windows in Flink
Date Sat, 08 Jun 2019 19:44:10 GMT
Hi everyone,

I’m using a Kafka source with a lot of watermark skew (i.e. new partitions
were added to the topic over time). The sink is a
FileIO.Write().withNumShards(1) to get ~ 1 file per day & an early trigger
to write at most 40,000 records per file. Unfortunately it looks like
there's 1 thread trying to write files for all the various days, instead of
writing multiple days' files in parallel. Is there anything I could do here
to parallelize the process? All of this is with the Flink runner.


Ladder <http://bit.ly/1VRtWfS>. The smart, modern way to insure your life.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message