spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Sharma <deepakmc...@gmail.com>
Subject Re: Why so many parquet file part when I store data in Alluxio or File?
Date Fri, 01 Jul 2016 04:01:56 GMT
Before writing coalesing your rdd to 1 .
It will create only 1 output file .
Multiple part file happens as all your executors will be writing their
partitions to separate part files.

Thanks
Deepak
On 1 Jul 2016 8:01 am, "Chanh Le" <giaosudau@gmail.com> wrote:

Hi everyone,
I am using Alluxio for storage. But I am little bit confuse why I am do set
block size of alluxio is 512MB and my file part only few KB and too many
part.
Is that normal? Because I want to read it fast? Is that many part effect
the read operation?
How to set the size of file part?

Thanks.
Chanh

Mime
View raw message