spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Blue <rb...@netflix.com.INVALID>
Subject Re: [DISCUSS] Add close() on DataWriter interface
Date Wed, 11 Dec 2019 18:31:47 GMT
Sounds good to me, too.

On Wed, Dec 11, 2019 at 1:18 AM Jungtaek Lim <kabhwan.opensource@gmail.com>
wrote:

> Thanks for the quick response, Wenchen!
>
> I'll leave this thread for early tomorrow so that someone in US timezone
> can chime in, and craft a patch if no one objects.
>
> On Wed, Dec 11, 2019 at 4:41 PM Wenchen Fan <cloud0fan@gmail.com> wrote:
>
>> PartitionReader extends Closable, seems reasonable to me to do the same
>> for DataWriter.
>>
>> On Wed, Dec 11, 2019 at 1:35 PM Jungtaek Lim <
>> kabhwan.opensource@gmail.com> wrote:
>>
>>> Hi devs,
>>>
>>> I'd like to propose to add close() on DataWriter explicitly, which is
>>> the place for resource cleanup.
>>>
>>> The rationalization of the proposal is due to the lifecycle of
>>> DataWriter. If the scaladoc of DataWriter is correct, the lifecycle of
>>> DataWriter instance ends at either commit() or abort(). That makes
>>> datasource implementors to feel they can place resource cleanup in both
>>> sides, but abort() can be called when commit() fails; so they have to
>>> ensure they don't do double-cleanup if cleanup is not idempotent.
>>>
>>> I've checked some callers to see whether they can apply
>>> "try-catch-finally" to ensure close() is called at the end of lifecycle for
>>> DataWriter, and they look like so, but I might be missing something.
>>>
>>> What do you think? It would bring backward incompatible change, but
>>> given the interface is marked as Evolving and we're making backward
>>> incompatible changes in Spark 3.0, so I feel it may not matter.
>>>
>>> Would love to hear your thoughts.
>>>
>>> Thanks in advance,
>>> Jungtaek Lim (HeartSaVioR)
>>>
>>>
>>>

-- 
Ryan Blue
Software Engineer
Netflix

Mime
View raw message