spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prithish <>
Subject Question about In-Memory size (cache / cacheTable)
Date Thu, 27 Oct 2016 05:19:24 GMT

I am trying to understand how in-memory size is changing in these
situations. Specifically, why is in-memory size much higher for avro and
parquet? Are there any optimizations necessary to reduce this?

Used cacheTable on each of these:

AVRO File (600kb) - In-memory size was 12mb
Parquet File (600kb) - In-memory size was 12mb
CSV File (3mb, was the same file as above) - In-memory size was 600Kb

Because of this, we need a cluster with a much bigger memory if we were to
cache the avro files.

Thanks for your help.


View raw message