spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maurin Lenglart <mau...@cuberonlabs.com>
Subject dynamic coalesce to pick file size
Date Tue, 26 Jul 2016 19:02:45 GMT
Hi,
I am doing a Sql query that return a Dataframe. Then I am writing the result of the query
using “df.write”, but the result get written in a lot of different small files (~100 of
200 ko). So now I am doing a “.coalesce(2)” before the write.
But the number “2” that I picked is static, is there have a way of dynamically picking
the number depending of the file size wanted? (around 256mb would be perfect)

I am running spark 1.6 on CDH using yarn, the files are written in parquet format.

Thanks

Mime
View raw message