spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: DataSourceV2 : Transactional Write support
Date Sat, 03 Aug 2019 19:48:47 GMT
> What you could try instead is intermediate output: inserting into
temporal table in executors, and move inserted records to the final table
in driver (must be atomic)

I think that this is the approach that other systems (maybe sqoop?) have
taken. Insert into independent temporary tables, which can be done quickly.
Then for the final commit operation, union and insert into the final table.
In a lot of cases, JDBC databases can do that quickly as well because the
data is already on disk and just needs to added to the final table.

On Fri, Aug 2, 2019 at 7:25 PM Jungtaek Lim <kabhwan@gmail.com> wrote:

> I asked similar question for end-to-end exactly-once with Kafka, and
> you're correct distributed transaction is not supported. Introducing
> distributed transaction like "two-phase commit" requires huge change on
> Spark codebase and the feedback was not positive.
>
> What you could try instead is intermediate output: inserting into temporal
> table in executors, and move inserted records to the final table in driver
> (must be atomic).
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Sat, Aug 3, 2019 at 4:56 AM Shiv Prashant Sood <shivprashant@gmail.com>
> wrote:
>
>> All,
>>
>> I understood that DataSourceV2 supports Transactional write and wanted
>> to  implement that in JDBC DataSource V2 connector ( PR#25211
>> <https://github.com/apache/spark/pull/25211> ).
>>
>> Don't see how this is feasible for JDBC based connector.  The FW suggest
>> that EXECUTOR send a commit message  to DRIVER, and actual commit should
>> only be done by DRIVER after receiving all commit confirmations. This will
>> not work for JDBC  as commits have to happen on the JDBC Connection which
>> is maintained by the EXECUTORS and JDBCConnection  is not serializable that
>> it can be sent to the DRIVER.
>>
>> Am i right in thinking that this cannot be supported for JDBC? My goal is
>> to either fully write or roll back the dataframe write operation.
>>
>> Thanks in advance for your help.
>>
>> Regards,
>> Shiv
>>
>
>
> --
> Name : Jungtaek Lim
> Blog : http://medium.com/@heartsavior
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>


-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message