spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Shah <>
Subject [Pyspark 2.4] Large number of row groups in parquet files created using spark
Date Thu, 25 Jul 2019 01:29:10 GMT
Hi All,

I have the following code which produces 1 600MB parquet file as expected,
however within this parquet file there are 42 row groups! I would expect it
to crate max 6 row groups, could someone please shed some light on this? Is
there any config setting which I can enable while submitting application
using spark-submit?

df =

I did try --conf spark.parquet.block.size & spark.dfs.blocksize, but that
makes no difference.


Rishi Shah

View raw message