spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shuporno Choudhury <shuporno.choudh...@gmail.com>
Subject Multiple columns using 'isin' command in pyspark
Date Thu, 29 Mar 2018 15:52:57 GMT
Hi Spark Users,

I am trying to achieve the 'IN' functionality of SQL using the isin
function in pyspark
Eg:     select count(*) from tableA
          where (col1, col2) in ((1, 100),(2, 200), (3,300))

We can very well have 1 column isin statements like:
    df.filter(df[0].isin(1,2,3)).count()

But, can I multiple columns in that statement like:
    df.filter((df[0],df[1]).isin((1,100),(2,200),(3,300)).count()

Is this possible to achieve?
Or do I have to create multiple isin statements, merge them using '&'
condition and then execute the statemnt to get the final result?

Any help would be really appreciated.

-- 
Thanks,
Shuporno Choudhury

Mime
View raw message