you can still use it as Dataset[Set[X]]. all transformations should work correctly.

however dataset.schema will show binary type, and dataset.show will show bytes (unfortunately).

for example:

scala> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]]
setEncoder: [X]=> org.apache.spark.sql.Encoder[Set[X]]

scala> val x = Seq(Set(1,2,3)).toDS
x: org.apache.spark.sql.Dataset[scala.collection.immutable.Set[Int]] = [value: binary]

scala> x.map(_ + 4).collect
res17: Array[scala.collection.immutable.Set[Int]] = Array(Set(1, 2, 3, 4))

scala> x.show
+--------------------+
|               value|
+--------------------+
|[2A 01 03 02 02 0...|
+--------------------+


scala> x.schema
res19: org.apache.spark.sql.types.StructType = StructType(StructField(value,BinaryType,true))


On Wed, Feb 1, 2017 at 12:03 PM, Jerry Lam <chilinglam@gmail.com> wrote:
Hi Koert,

Thanks for the tips. I tried to do that but the column's type is now Binary. Do I get the Set[X] back in the Dataset? 

Best Regards,

Jerry


On Tue, Jan 31, 2017 at 8:04 PM, Koert Kuipers <koert@tresata.com> wrote:
set is currently not supported. you can use kryo encoder. there is no other work around that i know of.

import org.apache.spark.sql.{ Encoder, Encoders }
implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]]

On Tue, Jan 31, 2017 at 7:33 PM, Jerry Lam <chilinglam@gmail.com> wrote:
Hi guys,

I got an exception like the following, when I tried to implement a user defined aggregation function. 

 Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Set[(scala.Long, scala.Long)]

The Set[(Long, Long)] is a field in the case class which is the output type for the aggregation.

Is there a workaround for this?

Best Regards,

Jerry