spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [DISCUSS] Add close() on DataWriter interface
Date Wed, 11 Dec 2019 09:18:13 GMT
Thanks for the quick response, Wenchen!

I'll leave this thread for early tomorrow so that someone in US timezone
can chime in, and craft a patch if no one objects.

On Wed, Dec 11, 2019 at 4:41 PM Wenchen Fan <cloud0fan@gmail.com> wrote:

> PartitionReader extends Closable, seems reasonable to me to do the same
> for DataWriter.
>
> On Wed, Dec 11, 2019 at 1:35 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
> wrote:
>
>> Hi devs,
>>
>> I'd like to propose to add close() on DataWriter explicitly, which is the
>> place for resource cleanup.
>>
>> The rationalization of the proposal is due to the lifecycle of
>> DataWriter. If the scaladoc of DataWriter is correct, the lifecycle of
>> DataWriter instance ends at either commit() or abort(). That makes
>> datasource implementors to feel they can place resource cleanup in both
>> sides, but abort() can be called when commit() fails; so they have to
>> ensure they don't do double-cleanup if cleanup is not idempotent.
>>
>> I've checked some callers to see whether they can apply
>> "try-catch-finally" to ensure close() is called at the end of lifecycle for
>> DataWriter, and they look like so, but I might be missing something.
>>
>> What do you think? It would bring backward incompatible change, but given
>> the interface is marked as Evolving and we're making backward incompatible
>> changes in Spark 3.0, so I feel it may not matter.
>>
>> Would love to hear your thoughts.
>>
>> Thanks in advance,
>> Jungtaek Lim (HeartSaVioR)
>>
>>
>>

Mime
View raw message