spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ram kumar <ramkumarro...@gmail.com>
Subject Re: Error joining dataframes
Date Wed, 18 May 2016 06:52:05 GMT
I tried
df1.join(df2, df1("id") === df2("id"), "outer").show

But there is a duplicate "id" and when I query the "id", I get
*Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
*ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)

I am currently using spark 1.5.2.
Is there any alternative way in 1.5

Thanks

On Wed, May 18, 2016 at 12:12 PM, Takeshi Yamamuro <linguin.m.s@gmail.com>
wrote:

> Also, you can pass the query that you'd like to use in spark-v1.6+;
>
> val df1 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "A")
> val df2 = Seq((1, 0), (2, 0), (3, 0)).toDF("id", "B")
> df1.join(df2, df1("id") === df2("id"), "outer").show
>
> // maropu
>
>
> On Wed, May 18, 2016 at 3:29 PM, ram kumar <ramkumarrock3@gmail.com>
> wrote:
>
>> If I run as
>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>
>> It takes as inner join.
>>
>>
>> On Wed, May 18, 2016 at 2:31 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> pretty simple, a similar construct to tables projected as DF
>>>
>>> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
>>> val t =
>>> HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>>> val rs = s.join(t,"time_id").join(c,"channel_id")
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 17 May 2016 at 21:52, Bijay Kumar Pathak <bkpathak@mtu.edu> wrote:
>>>
>>>> Hi,
>>>>
>>>> Try this one:
>>>>
>>>>
>>>> df_join = df1.*join*(df2, 'Id', "fullouter")
>>>>
>>>> Thanks,
>>>> Bijay
>>>>
>>>>
>>>> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarrock3@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I tried to join two dataframe
>>>>>
>>>>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>>>>
>>>>> df_join.registerTempTable("join_test")
>>>>>
>>>>>
>>>>> When querying "Id" from "join_test"
>>>>>
>>>>> 0: jdbc:hive2://> *select Id from join_test;*
>>>>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>>>>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>>>>> 0: jdbc:hive2://>
>>>>>
>>>>> Is there a way to merge the value of df1("Id") and df2("Id") into one
>>>>> "Id"
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>

Mime
View raw message