spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Laeeq Ahmed <>
Subject Spark Streaming question batch size
Date Tue, 01 Jul 2014 09:34:43 GMT

The window size in a spark streaming is time based which means we have different number of
elements in each window. For example if you have two streams (might be more) which are related
to each other and you want to compare them in a specific time interval. I am not clear how
it will work. Although they start running simultaneously, they might have different number
of elements in each time interval.

The following is output for two streams which have same number of elements and ran simultaneously.
The left most value is the number of elements in each window. If we add the number of elements
them, they are same for both streams but we can't compare both streams as they are different
in window size and number of windows.

Can we somehow make windows based on real time values for both streams? or Can we make windows
based on number of elements?

(n, (mean, varience, SD))

Stream 1


Stream 2


View raw message