spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [DISCUSS] Add close() on DataWriter interface
Date Thu, 12 Dec 2019 00:23:48 GMT
> Is this something that would be exposed/relevant to the Python API? Or is
this just for people implementing their own Spark data source?

It's latter, and it also helps simplifying built-in data sources as well
(as I found the needs while working on
https://github.com/apache/spark/pull/26845)

On Thu, Dec 12, 2019 at 3:53 AM Nicholas Chammas <nicholas.chammas@gmail.com>
wrote:

> Is this something that would be exposed/relevant to the Python API? Or is
> this just for people implementing their own Spark data source?
>
> On Wed, Dec 11, 2019 at 12:35 AM Jungtaek Lim <
> kabhwan.opensource@gmail.com> wrote:
>
>> Hi devs,
>>
>> I'd like to propose to add close() on DataWriter explicitly, which is the
>> place for resource cleanup.
>>
>> The rationalization of the proposal is due to the lifecycle of
>> DataWriter. If the scaladoc of DataWriter is correct, the lifecycle of
>> DataWriter instance ends at either commit() or abort(). That makes
>> datasource implementors to feel they can place resource cleanup in both
>> sides, but abort() can be called when commit() fails; so they have to
>> ensure they don't do double-cleanup if cleanup is not idempotent.
>>
>> I've checked some callers to see whether they can apply
>> "try-catch-finally" to ensure close() is called at the end of lifecycle for
>> DataWriter, and they look like so, but I might be missing something.
>>
>> What do you think? It would bring backward incompatible change, but given
>> the interface is marked as Evolving and we're making backward incompatible
>> changes in Spark 3.0, so I feel it may not matter.
>>
>> Would love to hear your thoughts.
>>
>> Thanks in advance,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>>

Mime
View raw message