spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Padmanabhan, Mahesh (contractor) <>
Subject Re: Spark streaming RDDs to Parquet records
Date Tue, 17 Jun 2014 21:47:02 GMT
Thanks Krishna. Seems like you have to use Avro and then convert that to Parquet. I was hoping
to directly convert RDDs to Parquet files. I’ll look into this some more.


From: Krishna Sankar <<>>
Reply-To: "<>" <<>>
Date: Tuesday, June 17, 2014 at 2:41 PM
To: "<>" <<>>
Subject: Re: Spark streaming RDDs to Parquet records


 *   One direction could be : create a parquet schema, convert & save the records to hdfs.
 *   This might help


On Tue, Jun 17, 2014 at 12:52 PM, maheshtwc <<>>

Is there an easy way to convert RDDs within a DStream into Parquet records?
Here is some incomplete pseudo code:

// Create streaming context
val ssc = new StreamingContext(...)

// Obtain a DStream of events
val ds = KafkaUtils.createStream(...)

// Get Spark context to get to the SQL context
val sc = ds.context.sparkContext

val sqlContext = new org.apache.spark.sql.SQLContext(sc)

// For each RDD
ds.foreachRDD((rdd: RDD[Array[Byte]]) => {

    // What do I do next?


View this message in context:
Sent from the Apache Spark User List mailing list archive at

This E-mail and any of its attachments may contain Time Warner Cable proprietary information,
which is privileged, confidential, or subject to copyright belonging to Time Warner Cable.
This E-mail is intended solely for the use of the individual or entity to which it is addressed.
If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination,
distribution, copying, or action taken in relation to the contents of and attachments to this
E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error,
please notify the sender immediately and permanently delete the original and any copy of this
E-mail and any printout.

View raw message