spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Becket Qin <becket....@gmail.com>
Subject [Spark SQL] Is it possible to do stream to stream inner join without event time?
Date Fri, 01 Jun 2018 10:10:18 GMT
Hi,

I am new to Spark and I'm trying to run a few queries from TPC-H using
Spark SQL.

According to the documentation here
<https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#inner-joins-with-optional-watermarking>,
it is OPTIONAL to have watermark defined in the case of inner join between
two streams. However, I am keeping getting the following exception:

org.apache.spark.sql.AnalysisException: Append output mode not supported
when there are streaming aggregations on streaming DataFrames/DataSets
without watermark

So it looks that the watermark is mandatory. Because there is no timestamp
in the TPC-H records, I am not able to specify watermark with event time.
Is there a recommended workaround? e.g. using the process time instead fo
event time?

Thanks,

Jiangjie (Becket) Qin

Mime
View raw message