storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael G. Noll" <michael+st...@michael-noll.com>
Subject Re: Implementing Real-Time Trending Topics in Storm
Date Tue, 01 Apr 2014 18:20:25 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Software Dev",

in RollingCountBolt there are two *time* related settings:

1. The size (duration) of the sliding window itself.  In seconds.
2. The time interval at which the latest sliding window count is sent
to downstream bolts.  In seconds.

See details here:
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/bolt/RollingCountBolt.java

I'm quoting from the code above:

"The bolt is configured by two parameters, the length of the sliding
window in seconds (which influences the output data of the bolt, i.e.
how it will count objects) and the emit frequency in seconds (which
influences how often the bolt will output the latest window counts).
For instance, if the window length is set to an equivalent of five
minutes and the emit frequency to one minute, then the bolt will
output the latest five-minute sliding window every minute."


> Does this mean that the rolling counts for the last 9 events are 
> ranked and emitted every 2 seconds? 7 seconds

The RollingCountBolt "thinks" in seconds.  However, behind the scenes
RollingCountBolt uses SlidingWindowCounter [1], which in turn is built
upon SlotBasedCounter [2].  Both the SlidingWindowCounter and the
SlotBasedCounter don't know anything about time or durations (no
seconds, minutes, and such).  This is by design, as it decouples the
responsibility of counting (SlidingWindowCounter/SlotBasedCounter)
from the responsibility of tracking the time (RollingCountBolt).

The Apache Spark project has exactly the same notion of
emitFrequencyInSeconds and windowLengthInSeconds, which they call
slideInterval and windowLength.  See
https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html.
 They also have a similar diagram to what I showed in [3] that
explains the idea behind sliding windows, see section "Window
Operations" in the Spark link above.


Does that make sense?
Michael



[1]
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlidingWindowCounter.java
[2]
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlotBasedCounter.java
[3]
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/


On 01.04.2014 18:45, Software Dev wrote:
> In the article
> (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
>
> 
and I was wondering what the rationale was for the emit frequencies
> and how they all relate to each other.
> 
> In the example the RollingCountBolt emits every 3 seconds, 
> IntermediateRankingBolt every 2 seconds and TotalRankingBolt every
> 2 seconds. Does this mean that the rolling counts for the last 9
> events are ranked and emitted every 2 seconds? 7 seconds? A little
> confused.
> 
> Thanks
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM7A2kACgkQeW5XuG18ujR93wCdHE6Ldu01fRgnMqjIi7chVMbu
uEMAnjUyrZQq0xkg2REUzbgvk31A85Dm
=YI7Y
-----END PGP SIGNATURE-----

Mime
View raw message