Thanks Shixiong, read in documentation as well that duplicates might exist because of task retries.

The Kafka source doesn’t support transaction. You may see partial data or duplicated data if a Spark task fails.

We are using spark batch to write Dataframe to Kafka topic. The spark write function with write.format(source = Kafka). 
Does spark provide similar guarantee like it provides with saving dataframe to disk; that partial data is not written to Kafka i.e. full dataframe is saved or if job fails no data is written to Kafka topic.


