spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karthikjay <aswin8...@gmail.com>
Subject [Structured Streaming][Parquet] How do specify partition and data when saving to Parquet
Date Sat, 03 Mar 2018 06:30:27 GMT
My DataFrame has the following schema
root
 |-- data: struct (nullable = true)
 |    |-- zoneId: string (nullable = true)
 |    |-- deviceId: string (nullable = true)
 |    |-- timeSinceLast: long (nullable = true)
 |-- date: date (nullable = true)

 
How can I do a writeStream with Parquet format and write the data
(containing zoneId, deviceId, timeSinceLast except date) and partition the
data by date ? I tried the following code and the partition by clause did
not work

val query1 = df1
      .writeStream
      .format("parquet")
      .option("path", "/Users/abc/hb_parquet/data")
      .option("checkpointLocation", "/Users/abc/hb_parquet/checkpoint")
      .partitionBy("data.zoneId")
      .start()



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message