spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject union of compatible types
Date Wed, 01 Feb 2017 16:02:00 GMT
spark's onion/merging of compatible types seems kind of weak. it works on
basic types in the top level record, but it fails for nested records, maps,
arrays, etc.

are there any known workarounds or plans to improve this?

for example i get errors like this:
org.apache.spark.sql.AnalysisException: Union can only be performed on
tables with the compatible column types.
StructType(StructField(_1,StringType,true),
StructField(_2,IntegerType,false)) <>
StructType(StructField(_1,StringType,true), StructField(_2,LongType,false))
at the first column of the second table

some examples that do work:

scala> Seq(1, 2, 3).toDF union Seq(1L, 2L, 3L).toDF
res2: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [value:
bigint]

scala> Seq((1,"x"), (2,"x"), (3,"x")).toDF union Seq((1L,"x"), (2L,"x"),
(3L,"x")).toDF
res3: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: bigint,
_2: string]

what i would also expect to work but currently doesn't:

scala> Seq((Seq(1),"x"), (Seq(2),"x"), (Seq(3),"x")).toDF union
Seq((Seq(1L),"x"), (Seq(2L),"x"), (Seq(3L),"x")).toDF

scala> Seq((1,("x",1)), (2,("x",2)), (3,("x",3))).toDF union
Seq((1L,("x",1L)), (2L,("x",2L)), (3L,("x", 3L))).toDF

Mime
View raw message