spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: Is watermark always set using processing time or event time or both?
Date Mon, 04 Sep 2017 07:05:19 GMT
Hi,

It's by default event time-based as there's no way to define the
column using withWatermark operator.

See http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@withWatermark(eventTime:String,delayThreshold:String):org.apache.spark.sql.Dataset[T]

But...

Given your initial Dataset can have no event time column you can
auto-generate one using current_date or current_timestamp or some
other way at processing time that would give you the other option (at
processing time).

And the last but not least...

In the most generic solution using
KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the
strategies or write a custom one. That's why they call it a solution
for an "arbitrary aggregation".

* http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset

* https://youtu.be/JAb4FIheP28

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Sep 1, 2017 at 8:15 PM, kant kodali <kanth909@gmail.com> wrote:
> Is watermark always set using processing time or event time or both?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message