spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Gautier <>
Subject I'm pretty sure this is a Dataset bug
Date Fri, 27 May 2016 15:24:51 GMT
Unfortunately I can't show exactly the data I'm using, but this is what I'm

I have a case class 'Product' that represents a table in our database. I
load that data via"jdbc").options(...)[Product]
and register it in a temp table 'product'.

For testing, I created a new Dataset that has only 3 records in it:

val ts = sqlContext.sql("select * from product where product_catalog_id in
(1, 2, 3)").as[Product]

I also created another one using the same case class and data, but from a
sequence instead.

val ds: Dataset[Product] = Seq(
      Product(Some(1), ...),
      Product(Some(2), ...),
      Product(Some(3), ...)

The spark shell tells me these are exactly the same type at this point, but
they don't behave the same."ts1").joinWith("ts2"), $"ts1.product_catalog_id" ===
$"ts2.product_catalog_id")"ds1").joinWith("ds2"), $"ds1.product_catalog_id" ===

Again, spark tells me these self joins return exactly the same type, but
when I do a .show on them, only the one created from a Seq works. The one
created by reading from the database throws this error:

org.apache.spark.sql.AnalysisException: cannot resolve
'ts1.product_catalog_id' given input columns: [..., product_catalog_id,

Is this a bug? Is there anyway to make the Dataset loaded from a table
behave like the one created from a sequence?


View raw message