spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akila Wajirasena <akila.wajiras...@gmail.com>
Subject Structured streaming flatMapGroupWithState results out of order messages when reading from Kafka
Date Tue, 09 Apr 2019 09:37:02 GMT
Hi

I have a Kafka topic  which is already loaded with data. I use a stateful
structured streaming pipeline using flatMapGroupWithState to consume the
data in kafka in a streaming manner.

However when I set shuffle partition count > 1 I get some out of order
messages in to each of my GroupState. Is this the expected behavior or is
the message ordering guaranteed when using flatMapGroupWithState with Kafka
source?

This my pipline;

Kafka => GroupByKey(key from Kafka schema) => flatMapGroupWithState =>
parquet

When I printed out the Kafka offset for each key inside my state update
function they are not in order. I am using spark 2.3.3.

Thanks & Regards,
Akila

Mime
View raw message