mattf-horton
[GitHub] metron issue #481: METRON-322 Global Batching and Flushing
Wed, 12 Jul 2017 21:19:44 GMT
Github user mattf-horton commented on the issue:
    Comment on testing:  There are so many permutations it only seemed reasonable to automate
them in unit test, and so I did.  As part of code review, please provide your opinion on whether
the provided unit tests are adequate, or what additional test cases should be added.
    Manual end-to-end testing, if you are so moved, consists of six scenarios for a given
sensor queue:
    1. Under **heavy continuous load** the batchSize still controls flushing behavior, because
the queue size always exceeds batchSize before queue age exceeds batchTimeout.
    2. Under **light continuous load**, where each queue continues to receive at least one
message per second, and batchSize is large enough it is never exceeded, then the batchTimeout
for each queue should control flushing behavior within +/- 1 sec, because each new message
triggers a check of the queue age and potential timeout flush.
      - NOTE: If the configured batchTimeout is set to a large number, bigger than `1/2 topology.message.timeout.secs
- 1` (which equals **14 sec** by default), then it will be replaced by an effective value
equal to `1/2 topology.message.timeout.secs - 1`.  Flushing will occur within +/- 1 sec of
each _effective_ batchTimeout interval, rather than the _configured_ batchTimeout interval.
    3. Under **light intermittent load**, where less than batchSize messages queue up, and
gaps between messages may exceed the timeout interval, then age checks and potential flush
events may be triggered by _either_ incoming messages or TickTuple events, depending on the
phase relationship between intermittent bursts of messages, and the TickTuple system tick.
 The TickTuple interval is guaranteed to be < `1/2 topology.message.timeout.secs`, hence
the default TickTuple interval is 14 seconds.  But if the smallest batchTimeout configured
for any sensor queue on the Bolt is < the default TickTuple interval, then that smallest
value becomes the actual TickTuple interval.  This produces three sub-cases, all of which
guarantee a flush event before any message gets recycled due to aging past `topology.message.timeout.secs`:
      - If the queue's configured batchTimeout is the smallest (or only) such on this Bolt,
and that number is smaller than the default TickTuple interval, then it _becomes_ the actual
TickTuple interval.  The queue is guaranteed to flush between 1x and 2x this interval.
      - If the queue's configured batchTimeout is not the smallest such, but still is <
the default TickTuple interval, then the queue is guaranteed to flush between its own `configured
batchTimeout` and its `configured batchTimeout + actual TickTuple interval` (which is less
than 2x its own `configured batchTimeout`).
      - If the queue's configured batchTimeout is > the default TickTuple interval (14
sec default), then its effective batchTimeout is set to the default TickTuple interval.  The
queue is guaranteed to flush between this `effective batchTimeout` and its `effective batchTimeout
+ actual TickTuple interval`.
    The upshot is that:
    * "Configured batchTimeout" should be thought of as "minimum age before you'll allow a
time-based flush" (capped by default TickTuple interval, aka 1/2 `topology.message.timeout.secs`)
    * "Actual TickTuple interval" is the "maximum time between age checks".  It will be <=
all the configured batchTimeouts for the various sensors on the Bolt.
    * When a flush actually happens may be up to "effective batchTimeout" + "actual TickTuple
interval", depending on exactly when intermittent message events and periodic Tick events

