spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From V0lleyBallJunki3 <>
Subject Set can be passed in as an input argument but not as output
Date Tue, 04 Sep 2018 03:47:03 GMT
I find that if the input is a Set then Spark doesn't try to find an encoder
for the Set but at the same time if the output of a method is a Set it does
try to find an encoder and if not found errors out. My understanding is that
even the input set has to be transferred to the executors?

The first method testMethodSet takes an Array[Set[String]] and returns a
def testMethodSet(info: Row, arrSetString: Array[Set[String]]): Set[String]
= {
  val assignments = info.getAs[Seq[String]](0).toSet
  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString

Another method takes the same arguments as the first one but returns a
 def testMethodSeq(info: Row,  arrSetString: Array[Set[String]]):
Seq[String] = {
  val assignments = info.getAs[Seq[String]](0).toSet

  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString.toSeq


The testMethodSet method throws an error => testMethodSet(s,"test",Array((Set("aaaa","bbbb")))))
<console>:20: error: Unable to find encoder for type stored in a Dataset. 
Primitive types (Int, String, etc) and Product types (case classes) are
supported by importing spark.implicits._  Support for serializing other
types will be added in future releases. =>

The testMethodSeq method works fine => testMethodSeq(s, Array((Set("aaaa","bbbb")))))
res12: org.apache.spark.sql.Dataset[Seq[String]] = [value: array<string>]


Sent from:

To unsubscribe e-mail:

View raw message