Thanks Shixiong, read in documentation as well that duplicates might exist because of task retries.

On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu <> wrote:
The Kafka source doesn’t support transaction. You may see partial data or duplicated data if a Spark task fails.

On Wed, Mar 27, 2019 at 1:15 AM hemant singh <> wrote:
We are using spark batch to write Dataframe to Kafka topic. The spark write function with write.format(source = Kafka). 
Does spark provide similar guarantee like it provides with saving dataframe to disk; that partial data is not written to Kafka i.e. full dataframe is saved or if job fails no data is written to Kafka topic.


Best Regards,