spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <>
Subject Count-based windows
Date Mon, 08 Dec 2014 08:56:46 GMT

I am interested in building an application that uses sliding windows not
based on the time when the item was received, but on either
* a timestamp embedded in the data, or
* a count (like: every 10 items, look at the last 100 items).

Also, I want to do this on stream data received from Kafka, but also on
HDFS data (where clearly the aspect "received in" is not present). I found <>
as an instruction for how to use the timestamp, but does anyone have a
suggestion on how to use item count as window size constraint?


View raw message