spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaniv Harpaz <>
Subject Re: Avro file question
Date Mon, 04 Nov 2019 17:28:09 GMT
It depends on your usage (when and how u read).
the smaller files you were thinking about are also larger than the HDFS
block size?
I would not go for something smaller than a block.

Usually (if relevant to the way you read the data) the partitioning helps
determine that.

Yaniv Harpaz
[ yaniv.harpaz at ]

On Mon, Nov 4, 2019 at 7:03 PM Sam <> wrote:

> Hi,
> How do we choose between single large avro file (size much larger than
> HDFS block size) vs multiple smaller avro files (close to HDFS block size?
> Since avro is splittable, is there even a need to split a very large avro
> file into smaller files?
> I’m assuming that a single large avro file can also be split into multiple
> mappers/reducers/executors during processing.
> Thanks.

View raw message