spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <deepakmc...@gmail.com>
Subject Re: Filtering in SparkR
Date Mon, 03 Oct 2016 07:24:55 GMT
Hi Yogesh
You can try registering these 2 DFs as temporary table and then execute the
sql query.
df1.registerTempTable("df1")
df2.registerTempTable("df2")

val rs = sqlContext.sql("SELECT a.* FROM df1 a, df2 b where a.id != b.id)

Thanks
Deepak

On Mon, Oct 3, 2016 at 12:38 PM, Yogesh Vyas <informyogi@gmail.com> wrote:

> Hi,
>
> I have two SparkDataFrames, df1 and df2.
> There schemas are as follows:
> df1=>SparkDataFrame[id:double, c1:string, c2:string]
> df2=>SparkDataFrame[id:double, c3:string, c4:string]
>
> I want to filter out rows from df1 where df1$id does not match df2$id
>
> I tried some expression: filter(df1,!(df1$id %in% df2$id)), but it does
> not works.
>
> Anybody could please provide me a solution for it?
>
> Regards,
> Yogesh
>



-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Mime
View raw message