spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Chauhan <>
Subject data storage formats
Date Sat, 07 Dec 2013 09:02:01 GMT
Hi all,

I am wondering what do people use as the on disk storage format. I have seen almost all the
examples use csv files to store and load data but that seems too simplisting for obvious reasons
(compressibility to name one). I was just interested to find out what people use to store
computation results. For example consider that you did some computation on some log files
and want to store all sorts of metrics for each and every user so that you can later use shark
to query it interactively. What is the preferred or good format to store all the data? Parquet?
RCFiles? csv? JSON?

-- Ankur
View raw message