spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Gautier <tim.gaut...@gmail.com>
Subject Re: Undocumented left join constraint?
Date Fri, 27 May 2016 20:48:23 GMT
Interesting, I did that on 1.6.1, Scala 2.10

On Fri, May 27, 2016 at 2:41 PM Ted Yu <yuzhihong@gmail.com> wrote:

> Which release did you use ?
>
> I tried your example in master branch:
>
> scala> val test2 = Seq(Test(2), Test(3), Test(4)).toDS
> test2: org.apache.spark.sql.Dataset[Test] = [id: int]
>
> scala>  test1.as("t1").joinWith(test2.as("t2"), $"t1.id" === $"t2.id",
> "left_outer").show
> +---+------+
> | _1|    _2|
> +---+------+
> |[1]|[null]|
> |[2]|   [2]|
> |[3]|   [3]|
> +---+------+
>
> On Fri, May 27, 2016 at 1:01 PM, Tim Gautier <tim.gautier@gmail.com>
> wrote:
>
>> Is it truly impossible to left join a Dataset[T] on the right if T has
>> any non-option fields? It seems Spark tries to create Ts with null values
>> in all fields when left joining, which results in null pointer exceptions.
>> In fact, I haven't found any other way to get around this issue without
>> making all fields in T options. Is there any other way?
>>
>> Example:
>>
>>     case class Test(id: Int)
>>     val test1 = Seq(Test(1), Test(2), Test(3)).toDS
>>     val test2 = Seq(Test(2), Test(3), Test(4)).toDS
>>     test1.as("t1").joinWith(test2.as("t2"), $"t1.id" === $"t2.id",
>> "left_outer").show
>>
>>
>

Mime
View raw message