spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hemant singh <hemant2...@gmail.com>
Subject Re: Spark Kafka Batch Write guarantees
Date Mon, 01 Apr 2019 18:32:46 GMT
Thanks Shixiong, read in documentation as well that duplicates might exist
because of task retries.

On Mon, 1 Apr 2019 at 9:43 PM, Shixiong(Ryan) Zhu <shixiong@databricks.com>
wrote:

> The Kafka source doesn’t support transaction. You may see partial data or
> duplicated data if a Spark task fails.
>
> On Wed, Mar 27, 2019 at 1:15 AM hemant singh <hemant2184@gmail.com> wrote:
>
>> We are using spark batch to write Dataframe to Kafka topic. The spark
>> write function with write.format(source = Kafka).
>> Does spark provide similar guarantee like it provides with saving
>> dataframe to disk; that partial data is not written to Kafka i.e. full
>> dataframe is saved or if job fails no data is written to Kafka topic.
>>
>> Thanks.
>>
> --
>
> Best Regards,
> Ryan
>

Mime
View raw message