spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam <>
Subject Avro file question
Date Mon, 04 Nov 2019 17:03:29 GMT

How do we choose between single large avro file (size much larger than HDFS
block size) vs multiple smaller avro files (close to HDFS block size?

Since avro is splittable, is there even a need to split a very large avro
file into smaller files?

I’m assuming that a single large avro file can also be split into multiple
mappers/reducers/executors during processing.


View raw message