Greetings !

                 I needed some efficient way in pyspark to execute a comparison (on all the attributes) between the current row and the previous row. My intent here is to leverage the distributed framework of Spark to the best extent so that can achieve a good speed. Please can anyone suggest me a suitable algorithm / command. Here is a snapshot of the underlying data which I need to compare:

Inline image 1

Thanks in advance !