spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix Kizhakkel Jose <felixkizhakkelj...@gmail.com>
Subject How to generate unique incrementing identifier in a structured streaming dataframe
Date Tue, 13 Jul 2021 18:52:25 GMT
Hello,

I am using Spark Structured Streaming to sink data from Kafka to AWS S3. I
am wondering if its possible for me to introduce a uniquely incrementing
identifier for each record as we do in RDBMS (incrementing long id)?
This would greatly benefit to range prune while reading based on this ID.

Any thoughts? I have looked at monotonically_incrementing_id but seems like
its not deterministic and it wont ensure new records gets next id from the
latest id what  is already present in the storage (S3)

Regards,
Felix K Jose

Mime
View raw message