metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mattf-horton <...@git.apache.org>
Subject [GitHub] incubator-metron pull request #442: METRON-322 Global Batching and Flushing
Date Mon, 06 Feb 2017 06:38:06 GMT
GitHub user mattf-horton opened a pull request:

    https://github.com/apache/incubator-metron/pull/442

    METRON-322 Global Batching and Flushing

    DO NOT INTEGRATE YET.  This is a preliminary review request.
    This Jira Ticket is to add timeout flushing to all Writers that do batch writes.  Currently
most of them will wait indefinitely for a batch to fill, resulting in message recycling due
to `topology.message.timeout.secs`.  This changeset so far implements timeout flushing for
the BulkMessageWriterBolt.
    
    I am now starting to work on unit tests, so while this code compiles and passes existing
unit tests, the new functionality has not yet been tested.  However, I would be grateful for
a quick review of the basic approach, focusing on changes in:
    * BulkMessageWriterBolt
    * BulkWriterComponent
    * BatchTimeoutHelper (new), and 
    * IndexingConfigurations.  
    
    All of the other changes are basically boilerplate, just laying "getBatchTimeout()" facade
methods alongside existing "getBatchSize()" facade methods.  (There's a lot of layers!)
    
    The flush-on-timeout logic is fairly straightforward.  It was implemented by a refactoring
of BulkWriterComponent and basic Tick Tuple reception in BulkMessageWriterBolt.
    
    The tricky part was figuring out the appropriate setting for 'topology.tick.tuple.freq.secs'
if the administrator configures non-default batchTimeouts.  It is necessary to enumerate the
batchTimeout settings for all configured sensorNames, which is implemented in IndexingConfigurations$getAllConfiguredTimeouts().
 Then multiple other factors must be taken into account to determine the allowed and recommended
settings, which is implemented in BatchTimeoutHelper.  If there are better ways to accomplish
these things, please share your ideas.
    
    After feedback is incorporated, I need to make similar changes to the ParserWriter, and
possibly other places in the code where batching is used and timeouts are not yet implemented.
 That is separable work, which I'm aware also needs to be done.  Thanks for taking time to
look at this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mattf-horton/incubator-metron METRON-322

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/442.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #442
    
----
commit 6fdf67c962a4bcfeb43da9c1aa1318ace6b194c3
Author: mattf-horton <mfoley@hortonworks.com>
Date:   2017-02-02T00:54:17Z

    METRON-322 The first easy changes: add batchTimeout most places batchSize is currently
used.

commit b3eeb3dbbc0834c4c9f3d2fc24d099f2f8e5a3ea
Author: mattf-horton <mfoley@hortonworks.com>
Date:   2017-02-02T08:57:41Z

    mods for flush and timeout in BulkWriterComponent.  Some work in BulkMessageWriterBolt,
but no getComponentConfiguration() yet.

commit 666f14a0ffe0eefc7de12e226d6953fd3e32d2d5
Author: mattf-horton <mfoley@hortonworks.com>
Date:   2017-02-06T06:06:07Z

    full implementation of getComponentConfiguration() method in BulkMessageWriterBolt, with
implementation details in IndexingConfigurations and new BatchTimeoutHelper class.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message