spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabh...@gmail.com>
Subject Re: DataSourceV2 : Transactional Write support
Date Sat, 03 Aug 2019 02:25:07 GMT
I asked similar question for end-to-end exactly-once with Kafka, and you're
correct distributed transaction is not supported. Introducing distributed
transaction like "two-phase commit" requires huge change on Spark codebase
and the feedback was not positive.

What you could try instead is intermediate output: inserting into temporal
table in executors, and move inserted records to the final table in driver
(must be atomic).

Thanks,
Jungtaek Lim (HeartSaVioR)

On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood <shivprashant@gmail.com>
wrote:

> All,
>
> I understood that DataSourceV2 supports Transactional write and wanted to
> implement that in JDBC DataSource V2 connector ( PR#25211
> <https://github.com/apache/spark/pull/25211> ).
>
> Don't see how this is feasible for JDBC based connector.  The FW suggest
> that EXECUTOR send a commit message  to DRIVER, and actual commit should
> only be done by DRIVER after receiving all commit confirmations. This will
> not work for JDBC  as commits have to happen on the JDBC Connection which
> is maintained by the EXECUTORS and JDBCConnection  is not serializable that
> it can be sent to the DRIVER.
>
> Am i right in thinking that this cannot be supported for JDBC? My goal is
> to either fully write or roll back the dataframe write operation.
>
> Thanks in advance for your help.
>
> Regards,
> Shiv
>


-- 
Name : Jungtaek Lim
Blog : http://medium.com/@heartsavior
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Mime
View raw message