spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Georg Heiler <georg.kf.hei...@gmail.com>
Subject Re: Efficient way to compare the current row with previous row contents
Date Mon, 12 Feb 2018 12:33:25 GMT
You should look into window functions for spark sql.
Debabrata Ghosh <mailfordebu@gmail.com> schrieb am Mo. 12. Feb. 2018 um
13:10:

> Hi,
>                  Greetings !
>
>                  I needed some efficient way in pyspark to execute a
> comparison (on all the attributes) between the current row and the previous
> row. My intent here is to leverage the distributed framework of Spark to
> the best extent so that can achieve a good speed. Please can anyone suggest
> me a suitable algorithm / command. Here is a snapshot of the underlying
> data which I need to compare:
>
> [image: Inline image 1]
>
> Thanks in advance !
>
> D
>

Mime
View raw message