spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramkumar Chokkalingam <>
Subject Re: Support for gz files ?
Date Mon, 21 Oct 2013 18:33:16 GMT
Oh that really helps. My bad, didn't read that clearly. In fact, I'm
already reading .gz files. But my concern was, will it be efficient to run
the job without unzipping the .gz files- which might itself take some time
to run for my input size.

I have around 20K input files each of the size ~250KB which are already in
.gz format . Also am not storing it in HDFS, but reading directly from
Local file system. So as to make this processing split across multiple
files - should I decompress them and compress again with snappy utility
before running them  or run them directly as .gz input files ?

View raw message