metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mattf-horton <...@git.apache.org>
Subject [GitHub] incubator-metron issue #481: METRON-322 Global Batching and Flushing
Date Tue, 21 Mar 2017 14:30:41 GMT
Github user mattf-horton commented on the issue:

    https://github.com/apache/incubator-metron/pull/481
  
    @cestella , thanks for looking at this. The primary motivation for adding batchTimeout
was to prevent tuple recycling due to "topology.message.timeout.secs".  Thus, correctly configuring
the timeout period requires interacting with the Bolt.  You're correct that this means each
Bolt that uses `BulkWriterComponent` must be modified to include the tick tuple processing,
as noted in my opening comments:
    ```
    After this patch is reviewed and accepted, similar work needs to be done for the ParserWriter,
and possibly other sub-components. That will be in a separate PR.
    ```
    I implemented the changes in `BulkWriterComponent` such that it would default to conservative
behavior if the containing Bolt didn't configure it.
    
    I considered using a timer thread instead of tick tuples, but:
    1. This is precisely one of the use cases contemplated by the Storm team when they created
Tick Tuples, as discussed in the [article here](https://hortonworks.com/blog/apache-storm-design-pattern-micro-batching/)
cited in the jira for METRON-322.
    1. It isn't sufficient to just create a timer thread.  One must also monitor that thread,
be able to restart it if it dies, make sure it doesn't do anything non-thread-safe, etc. 
These add significant complexity to the code, and uncertainty in the case of the thread-safeness,
since any pattern we create here will surely be imitated by other developments down the road,
and Bolt code is not typically thread-safe.
    1. On the other hand, using the built-in Tick Tuples avoids both the complexity, since
it handles the reliability issues internal to Storm, and uncertainty, since the Tick Tuple
is processed in the single flow of control of normal Bolt processing.
    
    So I think it's cleaner to use the feature provided by the Storm environment.  I'm open
to arguments to the contrary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message