spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohamed Nadjib MAMI <>
Subject Parition RDD by key to create DataFrames
Date Tue, 15 Mar 2016 17:33:44 GMT

I have a pair RDD of the form: (mykey, (value1, value2))

How can I create a DataFrame having the schema [V1 String, V2 String] to 
store [value1, value2] and save it into a Parquet table named "mykey"?

/createDataFrame()/ method takes an RDD and a schema (StructType) in 
parameters. The schema is known up front ([V1 String, V2 String]), but 
getting an RDD by partitioning the original RDD based on the key is what 
I can't get my head around so far.

Similar questions have been around (like

but they do not use DataFrames.

Thanks in advance!

View raw message