spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Moran <>
Subject OutputMetrics with data frames (spark-avro)
Date Mon, 17 Oct 2016 12:37:19 GMT

I'm using the Databricks spark-avro library to save some DataFrames out as
Avro (with Spark 1.6.1). When I do this however, I lose the information in
the spark events about the number of records and size of data written to
HDFS for each partition that's available if I save an RDD out as a text

Is this just a limitation of data frames, or is there a way of making this
information available? It's really useful for performance monitoring.



This email is confidential, if you are not the intended recipient please 
delete it and notify us immediately by emailing the sender. You should not 
copy it or use it for any purpose nor disclose its contents to any other 
person. Privitar Limited is registered in England with registered number 
09305666. Registered office Salisbury House, Station Road, Cambridge, 

View raw message