spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <>
Subject Re: DataSourceV2 : Transactional Write support
Date Sat, 03 Aug 2019 02:25:07 GMT
I asked similar question for end-to-end exactly-once with Kafka, and you're
correct distributed transaction is not supported. Introducing distributed
transaction like "two-phase commit" requires huge change on Spark codebase
and the feedback was not positive.

What you could try instead is intermediate output: inserting into temporal
table in executors, and move inserted records to the final table in driver
(must be atomic).

Jungtaek Lim (HeartSaVioR)

On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood <>

> All,
> I understood that DataSourceV2 supports Transactional write and wanted to
> implement that in JDBC DataSource V2 connector ( PR#25211
> <> ).
> Don't see how this is feasible for JDBC based connector.  The FW suggest
> that EXECUTOR send a commit message  to DRIVER, and actual commit should
> only be done by DRIVER after receiving all commit confirmations. This will
> not work for JDBC  as commits have to happen on the JDBC Connection which
> is maintained by the EXECUTORS and JDBCConnection  is not serializable that
> it can be sent to the DRIVER.
> Am i right in thinking that this cannot be supported for JDBC? My goal is
> to either fully write or roll back the dataframe write operation.
> Thanks in advance for your help.
> Regards,
> Shiv

Name : Jungtaek Lim
Blog :
Twitter :
LinkedIn :

View raw message