spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ram kumar <ramkumarro...@gmail.com>
Subject Re: Error joining dataframes
Date Wed, 18 May 2016 06:16:46 GMT
I tried

scala> var df_join = df1.join(df2, "Id", "fullouter")
<console>:27: error: type mismatch;
 found   : String("Id")
 required: org.apache.spark.sql.Column
       var df_join = df1.join(df2, "Id", "fullouter")
                                   ^

scala>

And I cant see the above method in
https://spark.apache.org/docs/1.5.1/api/java/org/apache/spark/sql/DataFrame.html#join(org.apache.spark.sql.DataFrame,%20org.apache.spark.sql.Column,%20java.lang.String)

On Wed, May 18, 2016 at 2:22 AM, Bijay Kumar Pathak <bkpathak@mtu.edu>
wrote:

> Hi,
>
> Try this one:
>
>
> df_join = df1.*join*(df2, 'Id', "fullouter")
>
> Thanks,
> Bijay
>
>
> On Tue, May 17, 2016 at 9:39 AM, ram kumar <ramkumarrock3@gmail.com>
> wrote:
>
>> Hi,
>>
>> I tried to join two dataframe
>>
>> df_join = df1.*join*(df2, ((df1("Id") === df2("Id")), "fullouter")
>>
>> df_join.registerTempTable("join_test")
>>
>>
>> When querying "Id" from "join_test"
>>
>> 0: jdbc:hive2://> *select Id from join_test;*
>> *Error*: org.apache.spark.sql.AnalysisException: Reference 'Id' is
>> *ambiguous*, could be: Id#128, Id#155.; line 1 pos 7 (state=,code=0)
>> 0: jdbc:hive2://>
>>
>> Is there a way to merge the value of df1("Id") and df2("Id") into one "Id"
>>
>> Thanks
>>
>
>

Mime
View raw message