spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pierre B <pierre.borckm...@realimpactanalytics.com>
Subject [SQL] Self join with ArrayType columns problems
Date Mon, 26 Jan 2015 13:17:30 GMT
Using Spark 1.2.0, we are facing some weird behaviour when performing self
join on a table with some ArrayType field. 
(potential bug ?) 

I have set up a minimal non working example here: 
https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
<https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
>  
In a nutshell, if the ArrayType column used for the pivot is created
manually in the StructType definition, everything works as expected. 
However, if the ArrayType pivot column is obtained by a sql query (be it by
using a "array" wrapper, or using a collect_list operator for instance),
then results are completely off. 

Could anyone have a look as this really is a blocking issue. 

Thanks! 

Cheers 

P.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Self-join-with-ArrayType-columns-problems-tp21364.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message