spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ehrlich <>
Subject Re: Bzip2 to Parquet format
Date Mon, 25 Jul 2016 03:00:21 GMT
You can load the text with sc.textFile() to an RDD[String], then use .map() to convert it into
an RDD[Row]. At this point you are ready to apply a schema. Use sqlContext.createDataFrame(rddOfRow,

Here is an example on how to define the StructType (schema) that you will combine with the
RDD[Row] to create a DataFrame.

Once you have the DataFrame, save it to parquet with“/path”) to create
a parquet file.

Reference for SQLContext / createDataFrame:

> On Jul 24, 2016, at 5:34 PM, janardhan shetty <> wrote:
> We have data in Bz2 compression format. Any links in Spark to convert into Parquet and
also performance benchmarks and uses study materials ?

View raw message