spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Schiavon <>
Subject ForeachBatch Structured Streaming
Date Wed, 14 Oct 2020 07:10:29 GMT

In the documentation it says:

   - By default, foreachBatch provides only at-least-once write guarantees.
   However, you can use the batchId provided to the function as way to
   deduplicate the output and get an exactly-once guarantee.

Taking the example snippet :

streamingDF.writeStream.foreachBatch { (batchDF: DataFrame, batchId: Long) =>
  batchDF.write.format(...).save(...)  // location 1
  batchDF.write.format(...).save(...)  // location 2

Let's assume I'm reading from Kafka, that means that by default *batchDF *may
or may not have duplicates?


View raw message