spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Ngai <a...@opsclarity.com>
Subject streaming window not behaving as advertised (v1.0.1)
Date Wed, 23 Jul 2014 01:01:59 GMT
I have a sample application pumping out records 1 per second.  The batch interval is set to
5 seconds.  Here’s a list of “observed window intervals” vs what was actually set

window=25, slide=25 : observed-window=25, overlapped-batches=0
window=25, slide=20 : observed-window=20, overlapped-batches=0
window=25, slide=15 : observed-window=15, overlapped-batches=0
window=25, slide=10 : observed-window=20, overlapped-batches=2
window=25, slide=5 : observed-window=25, overlapped-batches=3

can someone explain this behavior to me?  I’m trying to aggregate metrics by time batches,
but want to skip partial batches.  Therefore, I’m trying to find a combination which results
in 1 overlapped batch, but no combination I tried gets me there.  

Alan


Mime
View raw message