spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fitch, Simeon" <>
Subject Re: Public API access to UDTs
Date Fri, 29 Jan 2021 15:42:02 GMT
On Fri, Jan 29, 2021 at 9:46 AM Sean Owen <> wrote:

> Are there implications for storing UDTs in particular engines or formats?

I've found UDTs I/O to Parquet without problem.

They work fine with PySpark with implementation of mirror classes. Without
properly constructed mirror classe they show up as structs, which isn't a
bad fallback.

However, they do *not* work with Spark's use of Arrow, as they get rejected

> Just making it public for developers, even with a 'use at your own risk'
> warning, seems pretty small as a change?
> On Thu, Jan 28, 2021 at 5:10 PM Fitch, Simeon <> wrote:
>> Hi,
>> First time posting here, so apologies if I need to be directing this
>> topic elsewhere.
>> I'm the author of RasterFrames, and a contributor to GeoMesa's Spark SQL
>> module. Both make use of decently low level Catalyst constructs, include
>> custom UDTs; RasterFrames introduces a geospatial raster type, and GeoMesa
>> a geometry type.
>> In order to make this work we've circumvented the [`package private`](
>>  restriction on `UDTRegistration` by inserting
>> sibling classes into the package namespace. It's a hack, and works fine
>> with JVM 8, but violates the [much more restrictive](
>> module constructs in JVM 9+.
>> We've been monitoring [SPARK-7768](
>> (filed in 2015)  and
>> it's [associated PR]( for
>> years now, but it keeps getting kicked down the road(map).
>> As authors of open source systems we completely understand how and why
>> this happens, but we are at a critical juncture in our projects' lifecycle,
>> anchored to JVM 8 while other systems have moved on to later versions. We'd
>> also like to enjoy the benefits of later JVMs.
>> So... I'm here to find out how I and others critically needing public
>> access to `UDTRegistration` might better advocate for it?
>> I think (but not 100% sure) the PR linked above is more extensive than
>> what we need, also addressing usability around Encoders, for which we have
>> our own type class solution. My assumption to date has been all we need is
>> line 32 of `UDTRegistration` deleted (if there's folly therein, please say
>> so!). While I understand a reluctance to promote `UDTRegistration` to
>> `public`, I note that it has not been changed since 2016, perhaps a good
>> indicator that the API is stable enough. Marking it as `@Experimental`
>> could be a compromise option.
>> Thanks for reading this far and giving this consideration. Any and all
>> advice is appreciated.
>> Simeon (@metasim)
>> --
>> Simeon Fitch
>> Co-founder & VP of R&D
>> Astraea, Inc.

Simeon Fitch
Co-founder & VP of R&D
Astraea, Inc.

View raw message