spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject Fwd: [Structured Streaming] How to save entire column aggregation to a file
Date Fri, 06 Apr 2018 10:22:52 GMT
Any help?

Need urgent help. Someone please clarify the doubt?

---------- Forwarded message ----------
From: Aakash Basu <aakash.spark.raj@gmail.com>
Date: Thu, Apr 5, 2018 at 2:28 PM
Subject: [Structured Streaming] How to save entire column aggregation to a
file
To: user <user@spark.apache.org>


Hi,

I want to save an aggregate to a file without using any window, watermark
or groupBy. So, my aggregation is at entire column level.

df = spark.sql("select avg(col1) as aver from ds")


Now, the challenge is as follows -

1) If I use outputMode = Append, but "*Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark*"

query2 = df \
    .writeStream \
    .format("parquet") \
    .option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
    .option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
    .trigger(processingTime='3 seconds') \
    .start()



2) If I use outputMode = Complete, but "*Data source parquet does not
support Complete output mode;*"

query2 = df \
    .writeStream \
    .outputMode("complete") \
    .format("parquet") \
    .option("path", "/home/aakashbasu/Downloads/Kafka_Testing/Temp_AvgStore/") \
    .option("checkpointLocation", "/home/aakashbasu/Downloads/Kafka_Testing/") \
    .trigger(processingTime='3 seconds') \
    .start()


What to do? How to go about it?

Thanks,
Aakash.

Mime
View raw message