spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Usage of DropDuplicate in Spark
Date Tue, 22 Jun 2021 17:23:55 GMT
I am looking for any built-in API if at all exists?

On Tue, Jun 22, 2021 at 1:16 PM Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> this has been very slow
>
> On Tue, Jun 22, 2021 at 1:15 PM Sachit Murarka <connectsachit@gmail.com>
> wrote:
>
>> Hi Chetan,
>>
>> You can substract the data frame or use except operation.
>> First DF contains full rows.
>> Second DF contains unique rows (post remove duplicates)
>> Subtract first and second DF .
>>
>> hope this helps
>>
>> Thanks
>> Sachit
>>
>> On Tue, Jun 22, 2021, 22:23 Chetan Khatri <chetan.opensource@gmail.com>
>> wrote:
>>
>>> Hi Spark Users,
>>>
>>> I want to use DropDuplicate, but those records which I discard. I
>>> would like to log to the instrumental table.
>>>
>>> What would be the best approach to do that?
>>>
>>> Thanks
>>>
>>

Mime
View raw message