spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Tran <>
Subject Spark app write too many small parquet files
Date Mon, 28 Nov 2016 05:44:34 GMT
Hi Everyone,
Does anyone know what is the best practise of writing parquet file from
Spark ?

As Spark app write data to parquet and it shows that under that directory
there are heaps of very small parquet file (such as
e73f47ef-4421-4bcc-a4db-a56b110c3089.parquet). Each parquet file is only

Should it write each chunk of  bigger data size (such as 128 MB) with
proper number of files ?

Does anyone find out any performance changes when changing data size of
each parquet file ?


View raw message