spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Error joining dataframes
Date Wed, 18 May 2016 07:14:10 GMT
Look weird, seems spark-v1.5.x can accept the query.
What's the difference between the example and your query?

####

Welcome to

      ____              __

     / __/__  ___ _____/ /__

    _\ \/ _ \/ _ `/ __/  '_/

   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

      /_/

scala> :paste

// Entering paste mode (ctrl-D to finish)

val df1 = Seq((1, 0), (2, 0)).toDF("id", "A")

val df2 = Seq((2, 0), (3, 0)).toDF("id", "B")

df1.join(df2, df1("id") === df2("id"), "outer").show


// Exiting paste mode, now interpreting.


+----+----+----+----+

|  id|   A|  id|   B|

+----+----+----+----+

|   1|   0|null|null|

|   2|   0|   2|   0|

|null|null|   3|   0|

+----+----+----+----+


df1: org.apache.spark.sql.DataFrame = [id: int, A: int]

df2: org.apache.spark.sql.DataFrame = [id: int, B: int]





On Wed, May 18, 2016 at 3:52 PM, ram kumar <ramkumarrock3@gmail.com> wrote:

> I tried
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> But there is a duplicate "id" and when I query the "id", I get
> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>
> I am currently using spark 1.5.2.
> Is there any alternative way in 1.5
>
> Thanks
>
> On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
> wrote:
>
>> Also, you can pass the query that you'd like to use in spark-v1.6+;
>>
>> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
>> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
>> df1.join(df2, df1("id") === df2("id"), "outer").show
>>
>> // maropu
>>
>>
>> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarrock3@gmail.com>
>> wrote:
>>
>>> If I run as
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> It takes as inner join.
>>>
>>>
>>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> pretty simple, a similar construct to tables projected as DF
>>>>
>>>> val c =
>>>> HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>>> val t =
>>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpathak@mtu.edu> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Try this one:
>>>>>
>>>>>
>>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>>
>>>>> Thanks,
>>>>> Bijay
>>>>>
>>>>>
>>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarrock3@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I tried to join two dataframe
>>>>>>
>>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>>
>>>>>> df_join.registerTempTable("join_test")
>>>>>>
>>>>>>
>>>>>> When querying "Id" from "join_test"
>>>>>>
>>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>>> 0: jdbc:hive2://>
>>>>>>
>>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into
one
>>>>>> "Id"
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message