kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guozhang Wang <wangg...@gmail.com>
Subject Re: MAX_TASK_IDLE_MS_CONFIG not working?
Date Fri, 14 Dec 2018 16:57:31 GMT
Hello Dmitry,

There are generally speaking two sources of out-of-ordering data from the
viewpoint of Kafka Streams, and you can find them on the web docs here:
https://kafka.apache.org/21/documentation/streams/core-concepts#streams_out_of_ordering

As Matthias mentioned, if the source topics already contain out of ordering
data within a single topic-partition, or because of repartitioning, the
intermediate topic-partition may contain out-of-ordering data; such cases
are not covered by the newly introduced config in 2.1.

Guozhang

On Fri, Dec 14, 2018 at 12:43 AM Matthias J. Sax <matthias@confluent.io>
wrote:

> If you have out-of-order data, there is no guarantee that the current
> record has larger timestamp than previous record. Data is still
> processed in offset order.
>
> Also, you use selectKey() and groupByKey() thus triggering a
> repartioning that may introduce out-of-order data downstream even if all
> your original input data is ordered by timestamp.
>
>
> -Matthias
>
> On 12/13/18 10:53 PM, Dmitry Minkovsky wrote:
> > I just discovered that 2.1.0 included MAX_TASK_IDLE_MS_CONFIG and dropped
> > everything to play with it! Exciting!
> >
> > I built the following topology, attempting to synchronize two
> > topic-partitions:
> > https://gist.github.com/dminkovsky/45aa29aefefad564f9663cd36ad21ce1.
> >
> > However, I keep failing at
> >
> https://gist.github.com/dminkovsky/45aa29aefefad564f9663cd36ad21ce1#file-gistfile1-txt-L42
> .
> > Am I misunderstanding how this feature works? The two topic-partitions
> > appear to be grouped together in a single task:
> >
> > [2018-12-13 16:38:29,082] DEBUG
> > (org.apache.kafka.streams.processor.internals.TaskManager:120)
> > stream-thread
> > [migration-fed2f8fa-150e-494c-b1c8-0594f6d52a29-StreamThread-1] Adding
> > assigned tasks as active: {0_0=[join-requests-0,
> > settings-update-requests-0],
> > 1_0=[migration-KSTREAM-AGGREGATE-STATE-STORE-0000000008-repartition-0]}
> >
> >
> > So shouldn't this new config enable effectively synchronizing them?
> >
> >
> > Thank you,
> >
> > Dmitry
> >
>
>

-- 
-- Guozhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message