spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Doncel Ramírez (JIRA) <>
Subject [jira] [Created] (SPARK-26869) UDF with struct requires to have _1 and _2 as struct field names
Date Wed, 13 Feb 2019 12:04:00 GMT
Andrés Doncel Ramírez created SPARK-26869:

             Summary: UDF with struct requires to have _1 and _2 as struct field names
                 Key: SPARK-26869
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0, 2.3.0
         Environment: Ubuntu 18.04.1 LTS
            Reporter: Andrés Doncel Ramírez

When using a UDF which has a Seq of tuples as input, the struct field names need to match
"_1" and "_2". The following code illustrates this.

val df = sc.parallelize(Array(

val df1=df.agg(collect_list(struct("c1","c2")).as("c3"))
// Changing column names to _1 and _2 when creating the struct
val df2=df.agg(collect_list(struct(col("c1").as("_1"),col("c2").as("_2"))).as("c3"))

def takeUDF = udf({ (xs: Seq[(String, Double)]) =>


df1.withColumn("c4",takeUDF(col("c3"))).show() // this fails

df2.withColumn("c4",takeUDF(col("c3"))).show() // this works
The first one returns the following exception:

org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(c3)' due to data type mismatch:
argument 1 requires array<struct<_1:string,_2:double>> type, however, '`c3`' is
of array<struct<c1:string,c2:double>> type.;;

While the second works as expected and prints the result.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message