spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Femi Anthony <>
Subject Spark Stateful Streaming - add counter column
Date Wed, 23 Jan 2019 15:06:12 GMT

I have a a Spark Streaming process that consumes records off a Kafka topic, processes them
and sends them to a producer to publish on another topic. I would like to add a sequence number
column that can be used to identify records that have the same key and be incremented for
each duplicate reoccurence of that key. For example if the output sent to the producer is

Key, col1, col2, seqnum 
A, 67, dog, 1 
B, 56, cat, 1 
C, 89, fish, 1
then if A reoccurs within a reasonable time interval Spark would produce the following:

A, 67, dog, 2 
B, 56, cat, 2
etc. How would I do that ? I suspect that this is a pattern that occurs frequently, but I
haven't found any examples.

Sent from my iPhone
View raw message