metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From james-sirota <>
Subject [GitHub] incubator-metron issue #453: METRON-694: Index Errors from Topologies
Date Thu, 02 Mar 2017 17:20:03 GMT
Github user james-sirota commented on the issue:
    Hi guys, this PR is built on one fundamental assumption: kafka is always available.  The
source of truth for errors, therefore, is a kafka topic.  In a production setting errors should
go into their own topic and the retention period (size) of that topic queue should be set
very high so that you can retain as many errors as you can.  The reason we are are making
this configurable is so that we can easier test this in Ansible by throwing both errors and
valid telemetry into the same topic.  In production we would not do this and would have a
dedicated topic and a dedicated topology to error writing with parallelism tuned way down
to prioritize ingest of actual valid telemetry over errors.  The writing topology should attempt
to write errors from the queue exactly to either ES, HDFS, or both exactly once.  If it cannot
do that then it should ping whatever infrastructure monitoring component that you are using
that your ES or HDFS is down.  That, however, is a different PR and is ou
 t of context here.  I will need to file this PR as follow-on work.  
    With that said, I personally see no problem with the way this PR is implemented.  It allows
for a dedicated topic and writing of errors into ES or HDFS exactly once if running in production
setting.  There is an option to configure the topic so you can have telemetry and errors in
the same topic for testing on Ansible.  So +1 from me

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message