metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nickwallen <...@git.apache.org>
Subject [GitHub] metron pull request #1289: METRON-1810 Storm Profiler Intermittent Test Fail...
Date Wed, 05 Dec 2018 20:38:05 GMT
GitHub user nickwallen opened a pull request:

    https://github.com/apache/metron/pull/1289

    METRON-1810 Storm Profiler Intermittent Test Failure

    This PR hopefully resolves some of the more frequent, intermittent `ProfilerIntegrationTest`
failures.  
    
    ### Testing
    
    Run the integration tests and they should not fail.
    
    I have repeatedly run the integration tests on my laptop and have yet to see a failure.
 I am also repeatedly triggering Travis CI builds on this branch to see if any failures occur
there.  I can't prove the problem is solved, but I am hoping this helps.
    
    ### Changes
    
    * This change uses Caffeine's CacheWriter in place of the RemovalListener for profile
expiration.  This is explained more in the next section.
    
    * Removed the ability to define the cache maintenance executor for the `DefaultMessageDistributor`.
 This was only needed for testing RemovalListeners, which no longer exist.
    
    * I re-jiggered some of the integration tests to provide more information when they do
fail. I removed the use of `waitOrTimeout` and replaced it with `assertEventually`.  I am
also using a different style of assertion; Hamcrest-y.  These changes should provide additional
information and also be more consistent with the rest of the code base.
    
    * After removing the dependency that provided `waitOrTimeout`, I had to rework some of
the project dependencies because multiple versions of`httpclient` lib was being pulled in.
 This caused the new REST function in `stellar-common` to blow up when a Stellar execution
environment is loaded during the integration test.
    
    * Added additional debug logging for the caches including an estimate of the cache size.
    
    #### RemovalListener to CacheWriter
    
    Profiles are designed to expire and flush after a fixed period of time; the TTL.  The
integration tests rely on some of the profile values being flushed by this TTL mechanism.
 
    
    The TTL mechanism is driven by cache expiration. There is a cache of "active" profiles.
 This cache is configured to have values timeout after they have not been accessed for a fixed
period of time.  Once they timeout from the active cache, they are placed on an "expired"
cache.  Messages are not applied to expired profiles, but these expired profiles hang around
for a fixed period of time to allow them to be flushed.
    
    Previously, a Caffeine [RemovalListener](https://github.com/ben-manes/caffeine/wiki/Removal#removal-listeners)
was used so that when a profile expires from the active cache, it is placed into the expired
cache. When the tests fail, it seems that the `RemovalListener` is not notified in a timely
fashion, so the profile doesn't make it to the expired cache and so never flushes to be read
by the integration test.
    
     In Caffeine, these listeners are notified asynchronously, on a separate thread (via ForkJoinPool.commonPool()),
not inline with cache reads or writes.  For running tests that depend on RemovalListener's
it is recommended to set the cache maintenance executor to something like `MoreExecutors.sameThreadExecutor()`
so that cache maintenance is executed on the main execution thread when `cleanUp` is called.
 This was done for the unit tests, but was not done for the integration tests. 
    
    The Caffeine Wiki mentions that when notification should be performed synchronously, which
logically works for this use case, to use a [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer)
instead.
    
    > Removal listener operations are executed asynchronously using an Executor. The default
executor is ForkJoinPool.commonPool() and can be overridden via Caffeine.executor(Executor).
When the operation must be performed synchronously with the removal, use [CacheWriter](https://github.com/ben-manes/caffeine/wiki/Writer)
instead.
    
    This does not negatively impact production performance because the `ActiveCacheWriter`
only does work on a delete which occurs only rarely when a profile stops receiving messages,
not on a write.  In addition, these caches are very read-heavy as in most cases the cache
is only written to when a new profile is defined or on topology start-up.
    
    ## Pull Request Checklist
    - [ ] Is there a JIRA ticket associated with this PR? If not one needs to be created at
[Metron Jira](https://issues.apache.org/jira/browse/METRON/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel).
    - [ ] Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are
trying to resolve? Pay particular attention to the hyphen "-" character.
    - [ ] Has your PR been rebased against the latest commit within the target branch (typically
master)?
    - [ ] Have you included steps to reproduce the behavior or problem that is being changed
or addressed?
    - [ ] Have you included steps or a guide to how the change may be verified and tested
manually?
    - [ ] Have you ensured that the full suite of tests and checks have been executed in the
root metron folder via:
      ```
      mvn -q clean integration-test install && dev-utilities/build-utils/verify_licenses.sh

      ```
    - [ ] Have you written or updated unit tests and or integration tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies licensed in a way
that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)?
    - [ ] Have you verified the basic functionality of the build by building and running locally
with Vagrant full-dev environment or the equivalent?
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nickwallen/metron METRON-1810

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/metron/pull/1289.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1289
    
----
commit db902ea6db116cdb92516a68196270dd61f2b566
Author: Nick Allen <nick@...>
Date:   2018-12-01T13:39:43Z

    WIP.. Still getting reliable failures when run in IDE

commit f68d89999310392c8354f64be34c3cae3bac0b86
Author: Nick Allen <nick@...>
Date:   2018-12-04T13:45:50Z

    WIP

commit 1d13682b14df6d7d389d633f07a2f3b4d156b3ef
Author: Nick Allen <nick@...>
Date:   2018-12-04T16:40:42Z

    Resolve classpath issues

commit 67d945781d12a952765ccba5272d01e30c451599
Author: Nick Allen <nick@...>
Date:   2018-12-04T17:09:03Z

    Undo change to GetProfileT
    est

commit 21ab3384cf73d005118b3ae04a4998f7718b7bd3
Author: Nick Allen <nick@...>
Date:   2018-12-04T19:35:46Z

    Enhanced logging of caches

commit 3728a55287ce7cee9242399da9cec3236c0e522d
Author: Nick Allen <nick@...>
Date:   2018-12-04T19:39:02Z

    Making sure useful information is available if failure occurs

commit 67574b8a69acf21a46b6caae51e495f02a55f53b
Author: Nick Allen <nick@...>
Date:   2018-12-04T19:42:01Z

    Restore original logger props

commit d4e144a2cd55f785776e03f424560b60fa98eaa7
Author: Nick Allen <nick@...>
Date:   2018-12-04T20:05:56Z

    Refactored testProcessingTimeWithTimeToLiveFlush to use assertEventually

commit 44df64df63f3aa26b688f196f8fe7576a8ccd9bc
Author: Nick Allen <nick@...>
Date:   2018-12-05T18:36:22Z

    Using Caffeine's CacheWriter instead of a RemovalListener for profile expiration

----


---

Mime
View raw message