spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject [Structured Streaming] How to save entire column aggregation to a file
Date Thu, 05 Apr 2018 08:58:17 GMT
Hi,

I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.

df = spark.sql("select avg(col1) as aver from ds")


Now, the challenge is as follows -

1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"

query2 = df \
    .writeStream \
    .format("parquet") \
    .option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
    .option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
    .trigger(processingTime='3 seconds') \
    .start()



2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"

query2 = df \
    .writeStream \
    .outputMode("complete") \
    .format("parquet") \
    .option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
    .option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
    .trigger(processingTime='3 seconds') \
    .start()


What to do? How to go about it?

Thanks,
Aakash.

Mime
View raw message