spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Becket Qin <>
Subject [Spark SQL] Is it possible to do stream to stream inner join without event time?
Date Fri, 01 Jun 2018 10:10:18 GMT

I am new to Spark and I'm trying to run a few queries from TPC-H using
Spark SQL.

According to the documentation here
it is OPTIONAL to have watermark defined in the case of inner join between
two streams. However, I am keeping getting the following exception:

org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark

So it looks that the watermark is mandatory. Because there is no timestamp
in the TPC-H records, I am not able to specify watermark with event time.
Is there a recommended workaround? e.g. using the process time instead fo
event time?


Jiangjie (Becket) Qin

View raw message