spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Which of the hadoop file formats are supported by Spark ?
Date Sun, 19 Jan 2014 07:00:47 GMT
Spark was built using the standard Hadoop libraries of InputFormat and
OutputFormat, so any InputFormat and OutputFormat should ideally be
supported. Besides the simplified interfaces for text files
(sparkContext.textFile(...)
) and seq file (sparkContext.sequenceFile(...) ), you can specify your own
InputFormat and OutputFormat in sparkContext.hadoopFile(...). As suggested
in the first response, checkout the API.

TD


On Sat, Jan 18, 2014 at 10:16 PM, Ankur Chauhan <achauhan@brightcove.com>wrote:

> You may also want to consider Parquet (http://parquet.io). It is pretty
> efficient http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
>
> -- Ankur Chauhan

Mime
View raw message