spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Count-based windows
Date Mon, 08 Dec 2014 08:56:46 GMT
Hi,

I am interested in building an application that uses sliding windows not
based on the time when the item was received, but on either
* a timestamp embedded in the data, or
* a count (like: every 10 items, look at the last 100 items).

Also, I want to do this on stream data received from Kafka, but also on
HDFS data (where clearly the aspect "received in" is not present). I found <
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-windowing-Driven-by-absolutely-time-td1733.html#a1843>
as an instruction for how to use the timestamp, but does anyone have a
suggestion on how to use item count as window size constraint?

Thanks
Tobias

Mime
View raw message