spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zachary S Ennenga (Jira)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-28889) Allow UDTs to define custom casting behavior
Date Fri, 30 Aug 2019 18:39:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-28889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919819#comment-16919819
] 

Zachary S Ennenga commented on SPARK-28889:
-------------------------------------------

Based on https://issues.apache.org/jira/browse/SPARK-7768 it seems the intent is to make it
public again, though it has been pushed back a few times for reasons that aren't really discussed
in the ticket. Is there another solution for defining custom encodes for types within datasets
before that ticket is set to be completed?

If there isn't, and the intent to solve that problem via UDTs, this enhancement seems useful
to solve a specific set of problems, specifically, for automatically transforming simple types
in hive (IE string) to complex types (LocalDate) in datasets by using dataframe.as[ComplexType].

> Allow UDTs to define custom casting behavior
> --------------------------------------------
>
>                 Key: SPARK-28889
>                 URL: https://issues.apache.org/jira/browse/SPARK-28889
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.3
>            Reporter: Zachary S Ennenga
>            Priority: Minor
>
> Looking at `org.apache.spark.sql.catalyst.expressions.Cast`, UDTs do not support any
sort of casting except for identity casts, IE:
> {code:java}
> case (udt1: UserDefinedType[_], udt2: UserDefinedType[_]) if udt1.userClass == udt2.userClass
=>
>  true
> {code}
> I propose we add an additional piece of functionality here to allow UDTs to define their
own canCast and cast functions to allow users to define their own cast mechanisms.
> An example of how this might look:
> {code:java}
> case (fromType, toType: UserDefinedType[_]) =>
>  toType.canCast(fromType) // Returns boolean
> {code}
> {code:java}
> case (fromType, toType: UserDefinedType[_]) =>
>  toType.cast(fromType) // Returns Casting function
> {code}
> The UDT base class would contain a default implementation that replicates current behavior
(IE no casting).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message