spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Group by specific key and save as parquet
Date Wed, 02 Sep 2015 02:15:30 GMT
Starting from Spark 1.4, you can do this via dynamic partitioning:

sqlContext.table("trade").write.partitionBy("date").parquet("/tmp/path")

Cheng

On 9/1/15 8:27 AM, gtinside wrote:
> Hi ,
>
> I have a set of data, I need to group by specific key and then save as
> parquet. Refer to the code snippet below. I am querying trade and then
> grouping by date
>
> val df = sqlContext.sql("SELECT * FROM trade")
> val dfSchema = df.schema
> val partitionKeyIndex = dfSchema.fieldNames.seq.indexOf("date")
> //group by date
> val groupedByPartitionKey = df.rdd.groupBy { row =>
> row.getString(partitionKeyIndex) }
> val failure = groupedByPartitionKey.map(row => {
> val rowDF = sqlContext.createDataFrame(sc.parallelize(row._2.toSeq),
> dfSchema)
> val fileName = config.getTempFileName(row._1)
> try {
>          val dest = new Path(fileName)
>          if(DefaultFileSystem.getFS.exists(dest)) {
>              DefaultFileSystem.getFS.delete(dest, true)
>           }
>           rowDF.saveAsParquetFile(fileName)
>      } catch {
>             case e : Throwable => {
>                                          logError("Failed to save parquet
> file")
>             }
>         failure = true
>     }
>
> This code doesn't work well because of NestedRDD , what is the best way to
> solve this problem?
>
> Regards,
> Gaurav
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Group-by-specific-key-and-save-as-parquet-tp24527.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message