spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debabrata Ghosh <>
Subject Efficient way to compare the current row with previous row contents
Date Mon, 12 Feb 2018 12:10:13 GMT
                 Greetings !

                 I needed some efficient way in pyspark to execute a
comparison (on all the attributes) between the current row and the previous
row. My intent here is to leverage the distributed framework of Spark to
the best extent so that can achieve a good speed. Please can anyone suggest
me a suitable algorithm / command. Here is a snapshot of the underlying
data which I need to compare:

[image: Inline image 1]

Thanks in advance !


View raw message