spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Nandi <>
Subject Unit testing PySpark Code and doing assertion
Date Tue, 03 Sep 2019 15:04:29 GMT
I'm trying to do unit testing of my pyspark DataFrame code. My goal is to
do an assertion on the schema and data of the DataFrames. I'm looking for
options if there are any known libraries that I can use for doing the same.
Any library which can work on 10-15 records in the DataFrame is good for
As of now I'm using unittest library and using *assertCountEquals* method
to do the assertion. This is quite okay, but it does not do the schema
level validation. The failure message is not easily understandable.

If any of you are using any special techniques, let me know. Thanks
in advance.


View raw message