spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Public API access to UDTs
Date Fri, 29 Jan 2021 14:46:07 GMT
I'm also interested: are there problems with opening up this API beyond
needing to freeze it and keep it stable? it's pretty stable.
As @DeveloperApi at least?
Are there implications for storing UDTs in particular engines or formats?
Just making it public for developers, even with a 'use at your own risk'
warning, seems pretty small as a change?

On Thu, Jan 28, 2021 at 5:10 PM Fitch, Simeon <fitch@astraea.io> wrote:

> Hi,
>
> First time posting here, so apologies if I need to be directing this topic
> elsewhere.
>
> I'm the author of RasterFrames, and a contributor to GeoMesa's Spark SQL
> module. Both make use of decently low level Catalyst constructs, include
> custom UDTs; RasterFrames introduces a geospatial raster type, and GeoMesa
> a geometry type.
>
> In order to make this work we've circumvented the [`package private`](
> https://bit.ly/3pr0fVv)  restriction on `UDTRegistration` by inserting
> sibling classes into the package namespace. It's a hack, and works fine
> with JVM 8, but violates the [much more restrictive](
> https://bit.ly/3aadO5g) module constructs in JVM 9+.
>
> We've been monitoring [SPARK-7768](
> https://issues.apache.org/jira/browse/SPARK-7768) (filed in 2015)  and
> it's [associated PR](https://github.com/apache/spark/pull/16478) for
> years now, but it keeps getting kicked down the road(map).
>
> As authors of open source systems we completely understand how and why
> this happens, but we are at a critical juncture in our projects' lifecycle,
> anchored to JVM 8 while other systems have moved on to later versions. We'd
> also like to enjoy the benefits of later JVMs.
>
> So... I'm here to find out how I and others critically needing public
> access to `UDTRegistration` might better advocate for it?
>
> I think (but not 100% sure) the PR linked above is more extensive than
> what we need, also addressing usability around Encoders, for which we have
> our own type class solution. My assumption to date has been all we need is
> line 32 of `UDTRegistration` deleted (if there's folly therein, please say
> so!). While I understand a reluctance to promote `UDTRegistration` to
> `public`, I note that it has not been changed since 2016, perhaps a good
> indicator that the API is stable enough. Marking it as `@Experimental`
> could be a compromise option.
>
> Thanks for reading this far and giving this consideration. Any and all
> advice is appreciated.
>
> Simeon (@metasim)
>
>
> --
> Simeon Fitch
> Co-founder & VP of R&D
> Astraea, Inc.
>
>

Mime
View raw message