spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: [Spark SQL] Unexpected Behaviour
Date Mon, 28 Mar 2016 21:52:34 GMT
Hi Jerry

What do you expect the outcome to be?

This is Spark 1.6.1

I see this without dropping d2!


scala> d1.join(d2, d1("id") === d2("id"),
"left_outer").select(d1("label")).collect
res15: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0],
[0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0],
[0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0],
[0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0], [0])



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 28 March 2016 at 22:34, Jerry Lam <chilinglam@gmail.com> wrote:

> Hi spark users and developers,
>
> I'm using spark 1.5.1 (I have no choice because this is what we used). I
> ran into some very unexpected behaviour when I did some join operations
> lately. I cannot post my actual code here and the following code is not for
> practical reasons but it should demonstrate the issue.
>
> val base = sc.parallelize(( 0 to 49).map(i =>(i,0)) ++ (50 to
> 99).map((_,1))).toDF("id", "label")
> val d1=base.where($"label" === 0)
> val d2=base.where($"label" === 1)
> d1.join(d2, d1("id") === d2("id"),
> "left_outer").drop(d2("label")).select(d1("label"))
>
>
> The above code will throw an exception saying the column label is not
> found. Do you have a reason for throwing an exception when the column has
> not been dropped for d1("label")?
>
> Best Regards,
>
> Jerry
>

Mime
View raw message