spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Magnus Nilsson <>
Subject Logging DataFrame API pipelines
Date Tue, 02 Apr 2019 22:43:17 GMT
Hello all,

How do you log what is happening inside your Spark Dataframe pipelines?

I would like to collect statistics along the way, mostly count of rows at
particular steps, to see where rows where filtered and what not. Is there
any other way to do this than calling .count on the dataframe?



View raw message