mattf-horton
[GitHub] metron pull request #614: METRON-992: Create performance tuning guide
Thu, 08 Jun 2017 21:15:14 GMT
    +# Metron Performance Tunining Guide
    +## Overview
    +This document provides guidance from our experiences tuning the Apache Metron Storm topologies
for maximum performance. You'll find
    +suggestions for optimum configurations under a 1 gbps load along with some guidance around
the tooling we used to monitor and assess
    +our throughput.
    +In the simplest terms, Metron is a streaming architecture created on top of Kafka and
three main types of Storm topologies: parsers,
    +enrichment, and indexing. Each parser has it's own topology and there is also a highly
performant, specialized spout-only topology
    +for streaming PCAP data to HDFS. We found that the architecture can be tuned almost exclusively
through using a few primary Storm and
    +Kafka parameters along with a few Metron-specific options. You can think of the data
flow as being similar to water flowing through a
    +pipe, and the majority of these options assist in tweaking the various pipe widths in
the system.
    +## General Suggestions
    +Note that there is currently no method for specifying the number of tasks from the number
of executors in Flux topologies (enrichment,
    + indexing). By default, the number of tasks will equal the number of executors. Logically,
setting the number of tasks equal to the number
    +of executors is sensible. Storm enforces # executors <= # tasks. The reason you might
set the number of tasks higher than the number of
    +executors is for future performance tuning and rebalancing without the need to bring
down your topologies. The number of tasks is fixed
    +at topology startup time whereas the number of executors can be increased up to a maximum
value equal to the number of tasks.
    +We found that the default values for,, and max.uncommitted.offsets
worked well in nearly all cases.
    +As a general rule, it was optimal to set spout parallelism equal to the number of partitions
used in your Kafka topic. Any greater
    +parallelism will leave you with idle consumers since Kafka limits the max number of consumers
to the number of partitions. This is
    +important because Kafka has certain ordering guarantees for message delivery per partition
that would not be possible if more than
    +one consumer in a given consumer group were able to read from that partition.
    +## Tooling
    +Before we get to the actual tooling used to monitor performance, it helps to describe
what we might actually want to monitor and potential
    +pain points. Prior to switching over to the new Storm Kafka client, which leverages the
new Kafka consumer API under the hood, offsets
    +were stored in Zookeeper. While the broker hosts are still stored in Zookeeper, this
is no longer true for the offsets which are now
    +stored in Kafka itself. This is a configurable option, and you may switch back to Zookeeper
if you choose, but Metron is currently using
    +the new defaults. This is useful to know as you're investigating both correctness as
well as throughput performance.
    +First we need to setup some environment variables
    +export BROKERLIST=<your broker comma-delimated list of host:ports>
    +export ZOOKEEPER=<your zookeeper comma-delimated list of host:ports>
    +export KAFKA_HOME=<kafka home dir>
    +export METRON_HOME=<your metron home>
    +export HDP_HOME=<your HDP home>
    +If you have Kerberos enabled, setup the security protocol
    +$ cat /tmp/consumergroup.config
    +Now run the following command for a running topology's consumer group. In this example
we are using enrichments.
    +${KAFKA_HOME}/bin/ \
    +    --command-config=/tmp/consumergroup.config \
    +    --describe \
    +    --group enrichments \
    +    --bootstrap-server $BROKERLIST \
    +    --new-consumer
    +This will return a table with the following output depicting offsets for all partitions
and consumers associated with the specified
    +consumer group:
    +GROUP                          TOPIC              PARTITION  CURRENT-OFFSET  LOG-END-OFFSET
 LAG             OWNER
    +enrichments                    enrichments        9          29746066        29746067
       1               consumer-2_/
    +enrichments                    enrichments        3          29754325        29754326
       1               consumer-1_/
    +enrichments                    enrichments        43         29754331        29754332
       1               consumer-6_/
    +_Note_: You won't see any output until a topology is actually running because the consumer
groups only exist while consumers in the
    +spouts are up and running.
    +The primary column we're concerned with paying attention to is the LAG column, which
is the current delta calculation between the
    +current and end offset for the partition. This tells us how close we are to keeping up
with incoming data. And, as we found through
    +multiple trials, whether there are any problems with specific consumers getting stuck.
    +Taking this one step further, it's probably more useful if we can watch the offsets and
lags change over time. In order to do this
    +we'll add a "watch" command and set the refresh rate to 10 seconds.
    +watch -n 10 -d ${KAFKA_HOME}/bin/ \
    +    --command-config=/tmp/consumergroup.config \
    +    --describe \
    +    --group enrichments \
    +    --bootstrap-server $BROKERLIST \
    +    --new-consumer
    +Every 10 seconds the command will re-run and the screen will be refreshed with new information.
The most useful bit is that the
    +watch command will highlight the differences from the current output and the last output
    +We can also monitor our Storm topologies by using the Storm UI - see
    +And lastly, you can leverage some GUI tooling to make creating and modifying your Kafka
topics a bit easier -
    +## General Knobs and Levers
    Please spell out "number of", rather than using '#' as an abbreviation, since the sharp
sign conflicts with markdown syntax.  Better yet, does it make sense to actually give the
config parameter name?  Or a statement of where and what one changes in the Ambari UI?  Thanks.

