metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (METRON-1505) Intermittent Profiler Integration Test Failure
Date Mon, 02 Apr 2018 17:35:00 GMT

    [ https://issues.apache.org/jira/browse/METRON-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16422827#comment-16422827
] 

ASF GitHub Bot commented on METRON-1505:
----------------------------------------

GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/977

    METRON-1505 Intermittent Profiler Integration Test Failure

    ### Problem
    
    The integration tests were failing intermittently when Storm unexpectedly expired messages
generated by the integration tests.  When Storm expired these messages they were never received
by the Profiler bolts, which caused the integration tests to fail.
    
    ### Root Cause
    
    Storm's event window mechanism was not configured correctly to use the timestamp extracted
from the telemetry message.  Storm was instead defaulting to system time.  
    
    If the time when the downstream `ProfileBuilderBolt` processed a message differed significantly
enough from when the upstream `ProfileSplitterBolt` processed the message, the message would
be errantly expired by Storm.  
    
    This is why the problem could only be replicated when run in Travis, a resource constrained
environment.  When run on any other environment, the system time when these two events occur
will not differ enough for Storm to mistakenly expire the test messages.
    
    This did not necessarily matter for the core functioning of the Profiler, as the Profiler
itself continued to use the correct event timestamps.  This bug only affected significantly
out-of-order messages and the flushing of expired profiles for the integration tests.
    
    ### The Fix
    
    The simple fix was to ensure that Storm uses the correct event timestamp field.  Doing
this highlighted another problem.  Storm does not work correctly when using tick tuples along
with an event timestamp field.  Storm will attempt to extract an event timestamp from the
tick tuple, which will not exist and cause the entire topology to fail.
    
    This meant that I could not use tick tuples.  To work around this, I created a separate
thread that flushes the expired profiles regularly.  The separate thread introduces thread
safety concerns, so I also needed to perform some locking.
    
    ### Changes
    
    Most of these changes were done in separate commits to making review easier.
    
    1. Added a separate thread to the `ProfileBuilderBolt` to flush expired profiles regularly.
 This is the core fix to the integration test bug.
    
    2. Corrected the key generated to cache `ProfileBuilder` objects.  This previously relied
on the underlying `ProfileConfig.toString` method which was error prone and slow.  It now
uses the hash key.
    
    3. Reduced the number of Profiler integration tests.  There is now one integration test
that tests event time processing and another that tests the same profile using processing
time.
    
        Previously there were a number of different profiles that were tested.  This was necessary
before as the integration tests were the only effective way to test different profile logic.
 Since then, significant refactoring has occurred which allowed the same logic to be tested
in unit tests rather than in integration tests.  
    
        This allowed me to clean-up these tests which reduces run time and complexity in the
integration tests.
    
    4. Added some simple debug logging to `HBaseBolt`.
    
    ## Pull Request Checklist
    
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at
[Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are
trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically
master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed
or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested
manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the
root metron folder
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way
that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally
with Vagrant full-dev environment or the equivalent?
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1505

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/977.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #977
    
----
commit e9185e2249bfcca1c47d3809e05bf742a73ff80e
Author: Nick Allen <nick@...>
Date:   2018-03-26T12:13:26Z

    Added debug logging to HbaseBolt

commit ab3e6ad2cefb236a3de8a7c3dec3c100761ffc57
Author: Nick Allen <nick@...>
Date:   2018-03-26T15:34:13Z

    Corrected cache key usage and logging

commit 1cda9523fe65ce8ef9ec40305b873e6ed620cc78
Author: Nick Allen <nick@...>
Date:   2018-03-23T22:48:38Z

    A separate timer thread is used to flush expired profiles.

----


> Intermittent Profiler Integration Test Failure
> ----------------------------------------------
>
>                 Key: METRON-1505
>                 URL: https://issues.apache.org/jira/browse/METRON-1505
>             Project: Metron
>          Issue Type: Bug
>            Reporter: Nick Allen
>            Assignee: Nick Allen
>            Priority: Major
>
> The Profiler integration tests which use processing time fail intermittently when run
in Travis CI.
> {code:java}
> 2018-03-22 22:00:35 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with
timestamp=1521756035759
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with
timestamp=1521756035802
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 0 route(s) for message with
timestamp=1521756035806
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.2, timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035807
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.2, timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035808
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.2, timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035813
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.3, timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035814
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.3, timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035816
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:195 - Found route for message; profile=example2,
entity=10.0.0.3, timestamp=1521756035817
> 2018-03-22 22:00:35 DEBUG ProfileSplitterBolt:201 - Found 1 route(s) for message with
timestamp=1521756035817
> 2018-03-22 22:00:41 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756041122}
> 2018-03-22 22:00:41 DEBUG WindowManager:212 - [6] events expired from window.
> 2018-03-22 22:00:41 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:00:41 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:00:46 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756046122}
> 2018-03-22 22:00:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:46 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:00:51 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756051122}
> 2018-03-22 22:00:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:51 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:00:56 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756056122}
> 2018-03-22 22:00:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:00:56 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation,
[1] events in window.
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s),
0 expired, 1 new
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:00:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found
0 measurement(s).
> 2018-03-22 22:00:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush;
currentTime=0, flushTime=0
> 2018-03-22 22:01:01 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756061122}
> 2018-03-22 22:01:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:01 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:01:01 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:06 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756066122}
> 2018-03-22 22:01:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:06 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:11 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756071122}
> 2018-03-22 22:01:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:11 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:16 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756076122}
> 2018-03-22 22:01:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:16 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation,
[1] events in window.
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s),
0 expired, 1 new
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found
0 measurement(s).
> 2018-03-22 22:01:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush;
currentTime=0, flushTime=0
> 2018-03-22 22:01:21 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756081122}
> 2018-03-22 22:01:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:21 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:01:21 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:26 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756086122}
> 2018-03-22 22:01:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:26 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:31 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756091122}
> 2018-03-22 22:01:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:31 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:36 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756096122}
> 2018-03-22 22:01:36 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:36 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation,
[1] events in window.
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s),
0 expired, 1 new
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:36 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found
0 measurement(s).
> 2018-03-22 22:01:36 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush;
currentTime=0, flushTime=0
> 2018-03-22 22:01:41 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756101122}
> 2018-03-22 22:01:41 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:01:41 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:01:41 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:46 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756106122}
> 2018-03-22 22:01:46 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:46 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:51 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756111122}
> 2018-03-22 22:01:51 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:51 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:01:56 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756116122}
> 2018-03-22 22:01:56 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:01:56 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation,
[1] events in window.
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s),
0 expired, 1 new
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:01:56 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found
0 measurement(s).
> 2018-03-22 22:01:56 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush;
currentTime=0, flushTime=0
> 2018-03-22 22:02:01 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756121122}
> 2018-03-22 22:02:01 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:01 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:02:01 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:06 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756126122}
> 2018-03-22 22:02:06 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:06 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:11 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756131122}
> 2018-03-22 22:02:11 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:11 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:16 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756136122}
> 2018-03-22 22:02:16 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:16 DEBUG WindowManager:144 - invoking windowLifecycleListener onActivation,
[1] events in window.
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:276 - Tuple window contains 1 tuple(s),
0 expired, 1 new
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:365 - Emitted 0 measurement(s).
> 2018-03-22 22:02:16 DEBUG ProfileBuilderBolt:325 - Flushed expired profiles and found
0 measurement(s).
> 2018-03-22 22:02:16 DEBUG FixedFrequencyFlushSignal:114 - Flush=false, '0' ms until flush;
currentTime=0, flushTime=0
> 2018-03-22 22:02:21 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756141122}
> 2018-03-22 22:02:21 DEBUG WindowManager:212 - [1] events expired from window.
> 2018-03-22 22:02:21 DEBUG WindowManager:214 - invoking windowLifecycleListener.onExpiry
> 2018-03-22 22:02:21 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:26 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756146122}
> 2018-03-22 22:02:26 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:26 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:31 DEBUG WindowManager:189 - Scan events, eviction policy TimeEvictionPolicy{windowLength=5000,
referenceTime=1521756151122}
> 2018-03-22 22:02:31 DEBUG WindowManager:212 - [0] events expired from window.
> 2018-03-22 22:02:31 DEBUG WindowManager:147 - No events in the window, skipping onActivation
> 2018-03-22 22:02:34 DEBUG WindowManager:154 - Shutting down WindowManager
> 2018-03-22 22:02:46 DEBUG FixedFrequencyFlushSignal:67 - Flush counters reset{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message