spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <amoc...@verticalscope.com>
Subject DStream spark paper
Date Thu, 20 Mar 2014 20:36:23 GMT
I looked over the specs on page 9 from http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-259.pdf
The first paragraph mentions the window size is 30 seconds "Word-Count, which performs a sliding
window count over 30s;
and TopKCount, which finds the k most frequent words over the past 30s. "

The second paragraph mentions subsecond latency.

Putting these 2 together, is the paper saying that in the 30 sec window the tuples are delayed
at most 1 second?

The paper explains "By "end-to-end latency," we mean the time from when records are sent to
the system to when results incorporating them appear." This leads me to conclude that end-to-end
latency for a 30 sec window should be at least 30 seconds because results won't be incorporated
until the entire window is completed ie: 30sec. At the same time the paper claims latency
is sub second so clearly I'm misunderstanding something.

-Adrian


Mime
View raw message