Spark was built using the standard Hadoop libraries of InputFormat and OutputFormat, so any InputFormat and OutputFormat should ideally be supported. Besides the simplified interfaces for text files (sparkContext.textFile(...) ) and seq file (sparkContext.sequenceFile(...) ), you can specify your own InputFormat and OutputFormat in sparkContext.hadoopFile(...). As suggested in the first response, checkout the API. 


On Sat, Jan 18, 2014 at 10:16 PM, Ankur Chauhan <> wrote:
You may also want to consider Parquet ( It is pretty efficient

-- Ankur Chauhan