spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik LaBianca <erik.labia...@gmail.com>
Subject Re: ability to provide custom serializers
Date Mon, 05 Dec 2016 01:39:50 GMT
Thanks Michael!

> On Dec 2, 2016, at 7:29 PM, Michael Armbrust <michael@databricks.com> wrote:
> 
> I would love to see something like this.  The closest related ticket is probably https://issues.apache.org/jira/browse/SPARK-7768
<https://issues.apache.org/jira/browse/SPARK-7768> (though maybe there are enough people
using UDTs in their current form that we should just make a new ticket)

I’m not very familiar with UDT’s. Is this something I should research or just leave it
be and create a new ticket? I did notice the presence of a registry in the source code but
it seemed like it was targeted at a different use case.

> A few thoughts:
>  - even if you can do implicit search, we probably also want a registry for Java users.

That’s fine. I’m not 100% sure I can get the right implicit in scope as things stand anyway,
so let’s table that idea for now and do the registry.

>  - what is the output of the serializer going to be? one challenge here is that encoders
write directly into the tungsten format, which is not a stable public API. Maybe this is more
obvious if I understood MappedColumnType better?

My assumption was that the output would be existing scalar data types. So string, long, double,
etc. What I’d like to do is just “layer” the new ones on top already existing ones,
kinda like the case case encoder does.

> Either way, I'm happy to give further advice if you come up with a more concrete proposal
and put it on JIRA.

Great, let me know and I’ll create a ticket, or we can re-use SPARK-7768 and we can move
the discussion there.

Thanks!

—erik


Mime
View raw message