spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhinesh Hada <>
Subject [Spark SQL]: Does Union operation followed by drop duplicate follows "keep first"
Date Fri, 13 Sep 2019 15:43:28 GMT

I am trying to take union of 2 dataframes and then drop duplicate based on
the value of a specific column. But, I want to make sure that while
dropping duplicates, the rows from first data frame are kept.

df1 = df1.union(df2).dropDuplicates(['id'])

View raw message