spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaniv Harpaz <yaniv.har...@gmail.com>
Subject Re: Avro file question
Date Mon, 04 Nov 2019 17:28:09 GMT
It depends on your usage (when and how u read).
the smaller files you were thinking about are also larger than the HDFS
block size?
I would not go for something smaller than a block.

Usually (if relevant to the way you read the data) the partitioning helps
determine that.

Yaniv Harpaz
[ yaniv.harpaz at gmail.com ]


On Mon, Nov 4, 2019 at 7:03 PM Sam <games2013.sam@gmail.com> wrote:

> Hi,
>
> How do we choose between single large avro file (size much larger than
> HDFS block size) vs multiple smaller avro files (close to HDFS block size?
>
> Since avro is splittable, is there even a need to split a very large avro
> file into smaller files?
>
> I’m assuming that a single large avro file can also be split into multiple
> mappers/reducers/executors during processing.
>
> Thanks.
>

Mime
View raw message