spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Wampler <deanwamp...@gmail.com>
Subject Re: [SQL] Self join with ArrayType columns problems
Date Mon, 26 Jan 2015 13:44:10 GMT
You are creating a HiveContext, then using the sql method instead of hql.
Is that deliberate?

The code doesn't work if you replace HiveContext with SQLContext. Lots of
exceptions are thrown, but I don't have time to investigate now.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Jan 26, 2015 at 7:17 AM, Pierre B <
pierre.borckmans@realimpactanalytics.com> wrote:

> Using Spark 1.2.0, we are facing some weird behaviour when performing self
> join on a table with some ArrayType field.
> (potential bug ?)
>
> I have set up a minimal non working example here:
> https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
> <https://gist.github.com/pierre-borckmans/4853cd6d0b2f2388bf4f
> >
> In a nutshell, if the ArrayType column used for the pivot is created
> manually in the StructType definition, everything works as expected.
> However, if the ArrayType pivot column is obtained by a sql query (be it by
> using a "array" wrapper, or using a collect_list operator for instance),
> then results are completely off.
>
> Could anyone have a look as this really is a blocking issue.
>
> Thanks!
>
> Cheers
>
> P.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Self-join-with-ArrayType-columns-problems-tp21364.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message