spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Spark 2.2 streaming with append mode: empty output
Date Mon, 14 Aug 2017 23:55:42 GMT
In append mode, the aggregation outputs a row only when the watermark has
been crossed and the corresponding aggregate is *final*, that is, will not
be updated any more.
See
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking

On Mon, Aug 14, 2017 at 4:09 PM, Ashwin Raju <theraju@gmail.com> wrote:

> Hi,
>
> I am running Spark 2.2 and trying out structured streaming. I have the
> following code:
>
> from pyspark.sql import functions as F
>
> df=frame \
>
>     .withWatermark("timestamp","1 minute") \
>
>     .groupby(F.window("timestamp","1 day"),*groupby_cols) \
>
>     .agg(f.sum('bytes'))
>
> query = frame.writeStream \
>
> .format("console")
>
> .option("checkpointLocation", '\some\chkpoint')
>
> .outputMode("complete")
>
> .start()
>
>
>
> query.awaitTermination()
>
>
>
> It prints out a bunch of aggregated rows to console. When I run the same
> query with outputMode("append") however, the output only has the column
> names, no rows. I was originally trying to output to parquet, which only
> supports append mode. I was seeing no data in my parquet files, so I
> switched to console output to debug, then noticed this issue. Am I
> misunderstanding something about how append mode works?
>
>
> Thanks,
>
> Ashwin
>
>

Mime
View raw message