spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Odersky <ja...@odersky.com>
Subject Re: How to use custom class in DataSet
Date Tue, 30 Aug 2016 22:57:50 GMT
Implementing custom encoders is unfortunately not well supported at
the moment (IIRC there are plans to eventually add an api for user
defined encoders).

That being said, there are a couple of encoders that can work with
generic, serializable data types: "javaSerialization" and "kryo",
found here http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Encoders$.
These encoders need to be specified explicitly, as in
"spark.createDataset(...)(Encoders.javaSerialization)"

In Spark 2.1 there will also be special trait
"org.apache.spark.sql.catalyst.DefinedByConstructorParams" that can be
mixed into arbitrary classes and that has implicit encoders available.

If you don't control the source of the class in question and it is not
serializable, it may still be possible to define your own Encoder by
implementing your own "o.a.s.sql.catalyst.encoders.ExpressionEncoder".
However, that requires quite some knowledge on how Spark's SQL
optimizer (catalyst) works internally and I don't think there is much
documentation on that.

regards,
--Jakob

On Mon, Aug 29, 2016 at 10:39 PM, canan chen <ccnfdu@gmail.com> wrote:
>
> e.g. I have a custom class A (not case class), and I'd like to use it as
> DataSet[A]. I guess I need to implement Encoder for this, but didn't find
> any example for that, is there any document for that ? Thanks
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message