spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: [DISCUSS] Add close() on DataWriter interface
Date Wed, 11 Dec 2019 18:53:21 GMT
Is this something that would be exposed/relevant to the Python API? Or is
this just for people implementing their own Spark data source?

On Wed, Dec 11, 2019 at 12:35 AM Jungtaek Lim <kabhwan.opensource@gmail.com>
wrote:

> Hi devs,
>
> I'd like to propose to add close() on DataWriter explicitly, which is the
> place for resource cleanup.
>
> The rationalization of the proposal is due to the lifecycle of DataWriter.
> If the scaladoc of DataWriter is correct, the lifecycle of DataWriter
> instance ends at either commit() or abort(). That makes datasource
> implementors to feel they can place resource cleanup in both sides, but
> abort() can be called when commit() fails; so they have to ensure they
> don't do double-cleanup if cleanup is not idempotent.
>
> I've checked some callers to see whether they can apply
> "try-catch-finally" to ensure close() is called at the end of lifecycle for
> DataWriter, and they look like so, but I might be missing something.
>
> What do you think? It would bring backward incompatible change, but given
> the interface is marked as Evolving and we're making backward incompatible
> changes in Spark 3.0, so I feel it may not matter.
>
> Would love to hear your thoughts.
>
> Thanks in advance,
> Jungtaek Lim (HeartSaVioR)
>
>
>

Mime
View raw message