spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Rehnby <>
Subject [Datasource API V2] Creating datasource - no step for final cleanup on read
Date Mon, 18 Jan 2021 19:38:53 GMT

Currently working on creating a custom datasource using the Spark
Datasource API V2. On read, our datasource uses some temporary files in a
distributed store which we'd like to run some cleanup step on once the
entire operation is done. However, there does not seem to be anything
called in the API for an entire read being done, only the close() function
on individual PartitionReaders.

What I was looking for would be the equivalent to the commit() and abort()
functions in BatchWrite, but for the Scan or Batch class. I'm wondering if
there's any good way to achieve running something at the end of the read
operation using the current API? If not, I would ask if this might be a
useful addition, or if there are design reasons for not including such a


View raw message