What’s your PySpark function? Is it a UDF? If so consider using pandas UDF introduced in Spark 2.3.Sent from my iPhonePardon the dumb thumb typos :)
On Mar 18, 2018, at 10:54 PM, Debabrata Ghosh <email@example.com> wrote:DebuWill really appreciate for your feedback as per your earliest convenience. Thanks,Further, I am contemplating to run the function in parallel. For example, I would like to divide the total rows in my dataframe by 4 and accordingly I will prepare a set of 500 rows and want to call my pyspark function in parallel. I wanted to know if there is any library / pyspark function which I can leverage to do this execution in parallel.Hi,My dataframe is having 2000 rows. For processing each row it consider 3 seconds and so sequentially it takes 2000 * 3 = 6000 seconds , which is a very high time.