metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <...@git.apache.org>
Subject [GitHub] incubator-metron issue #486: METRON-793: Migrate to storm-kafka-client kafka...
Date Wed, 22 Mar 2017 15:41:56 GMT
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/486
  
    # Testing Plan
    ## Preliminaries
    
    * Please perform the following tests on the `full-dev` vagrant environment.
    * Set an environment variable to indicate `METRON_HOME`:
    `export METRON_HOME=/usr/metron/0.3.1` 
    
    
    ## Ensure Data Flows from the Indices
    Ensure that with a basic full-dev we get data into the elasticsearch
    indices and into HDFS.
    
    ## (Optional) Free Up Space on the virtual machine
    
    First, let's free up some headroom on the virtual machine.  If you are running this on
a
    multinode cluster, you would not have to do this.
    * Stop and disable Metron in Ambari
    * Kill monit via `service monit stop`
    * Kill tcpreplay via `for i in $(ps -ef | grep tcpreplay | awk '{print $2}');do kill -9
$i;done`
    * Kill yaf via `for i in $(ps -ef | grep yaf | awk '{print $2}');do kill -9 $i;done`
    * Kill bro via `for i in $(ps -ef | grep bro | awk '{print $2}');do kill -9 $i;done`
    
    ## Test the PCAP topology
    
    A new kafka spout necessitates testing pcap.
    ### Install and start pycapa 
    ```
    # set env vars
    export PYCAPA_HOME=/opt/pycapa
    export PYTHON27_HOME=/opt/rh/python27/root
    
    # Install these packages via yum (RHEL, CentOS)
    yum -y install epel-release centos-release-scl 
    yum -y install "@Development tools" python27 python27-scldevel python27-python-virtualenv
libpcap-devel libselinux-python
    
    # Setup directories
    mkdir $PYCAPA_HOME && chmod 755 $PYCAPA_HOME
    
    # Create virtualenv
    export LD_LIBRARY_PATH="/opt/rh/python27/root/usr/lib64"
    ${PYTHON27_HOME}/usr/bin/virtualenv pycapa-venv
    
    # Copy pycapa
    # copy incubator-metron/metron-sensors/pycapa from the Metron source tree into $PYCAPA_HOME
on the node you would like to install pycapa on.
    
    # Build it
    cd ${PYCAPA_HOME}/pycapa
    # activate the virtualenv
    source ${PYCAPA_HOME}/pycapa-venv/bin/activate
    pip install -r requirements.txt
    python setup.py install
    
    # Run it
    cd ${PYCAPA_HOME}/pycapa-venv/bin
    pycapa --producer --topic pcap -i eth1 -k node1:6667
    ```
    ### Ensure pycapa can write to HDFS
    * Ensure that `/apps/metron/pcap` exists and can be written to by the
      storm user.  If not, then:
    ```
    sudo su - hdfs
    hadoop fs -mkdir -p /apps/metron/pcap
    hadoop fs -chown metron:hadoop /apps/metron/pcap
    hadoop fs -chmod 775 /apps/metron/pcap
    ``` 
    * Start the pcap topology via `$METRON_HOME/bin/start_pcap_topology.sh`
    * Start the pycapa packet capture producer on eth1 via `/usr/bin/pycapa --producer --topic
pcap -i eth1 -k node1:6667`
    * Watch the topology in the Storm UI and kill the packet capture utility from before,
when the number of packets ingested is over 3k.  Ensure that at at least 3 files exist on
HDFS by running `hadoop fs -ls /apps/metron/pcap`
    * Choose a file (denoted by $FILE) and dump a few of the contents using the pcap_inspector
utility via `$METRON_HOME/bin/pcap_inspector.sh -i $FILE -n 5`
    * Choose one of the lines and note the protocol.
      * Note that when you run the commands below, the resulting file will be placed in the
execution directory where you kicked off the job from.
    * Run a Stellar query filter query by executing a command similar to the following, with
the values noted above (match your start_time format to the date format provided - default
is to use millis since epoch):
    ```
    $METRON_HOME/bin/pcap_query.sh query -st "20160617" -df "yyyyMMdd" -query "protocol ==
6" -rpf 500
    ```
    * Verify the MR job finishes successfully. Upon completion, you should see multiple files
named with relatively current datestamps in your current directory, e.g. pcap-data-20160617160549737+0000.pcap
    * Copy the files to your local machine and verify you can them it in Wireshark. I chose
a middle file and the last file. The middle file should have 500 records (per the records_per_file
option), and the last one will likely have a number of records <= 500.
    
    ## Test the Profiler
    
    ### Setup
    * Ensure that Metron is stopped and put in maintenance mode in Ambari
    * Create the profiler hbase table
    `echo "create 'profiler', 'P'" | hbase shell`
    
    * Open `~/rand_gen.py` and paste the following:
    ```
    #!/usr/bin/python
    import random
    import sys
    import time
    def main():
      mu = float(sys.argv[1])
      sigma = float(sys.argv[2])
      freq_s = int(sys.argv[3])
      while True:
        out = '{ "value" : ' + str(random.gauss(mu, sigma)) + ' }'
        print out
        sys.stdout.flush()
        time.sleep(freq_s)
    
    if __name__ == '__main__':
      main()
    ```
    This will generate random JSON maps with a numeric field called `value`
    
    * Set the profiler to use 1 minute tick durations:
      * Edit `$METRON_HOME/config/profiler.properties` to adjust the capture duration by changing
`profiler.period.duration=15` to `profiler.period.duration=1`
      * Edit `$METRON_HOME/config/zookeeper/global.json` and add the following properties:
    ```
    "profiler.client.period.duration" : "1",
    "profiler.client.period.duration.units" : "MINUTES"
    ```
    ### Deploy the custom parser
    
    * Edit the value parser config at `$METRON_HOME/config/zookeeper/parsers/value.json`:
    ```
    {
      "parserClassName":"org.apache.metron.parsers.json.JSONMapParser",
      "sensorTopic":"value",
      "fieldTransformations" : [
        {
        "transformation" : "STELLAR"
       ,"output" : [ "num_profiles_parser", "mean_parser" ]
       ,"config" : {
          "num_profiles_parser" : "LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5
minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us')))",
          "mean_parser" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5
minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us'))))"
                   }
        }
                               ]
    }
    ```
    
    * Edit the value enrichment config at `$METRON_HOME/config/zookeeper/enrichments/value.json`:
    ```
    {
      "enrichment" : {
       "fieldMap": {
          "stellar" : {
            "config" : {
            "num_profiles_enrichment" : "LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5
minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us')))",
            "mean_enrichment" : "STATS_MEAN(STATS_MERGE(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5
minute window every 10 minutes starting from 2 minutes ago until 32 minutes ago excluding
holidays:us'))))"
                      }
          }
        }
      }
    }
    ```
    * Create the value kafka topic:
      `/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper node1:2181 --create --topic
value --partitions 1 --replication-factor 1`
    * Push the configs via `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper
-z node1:2181`
    * Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z node1:2181 -s
value`
    
    
    ### Start the profiler
    
    * Edit `$METRON_HOME/config/zookeeper/profiler.json` and paste in the following:
    ```
    {
      "profiles": [
        {
          "profile": "stat",
          "foreach": "'global'",
          "onlyif": "true",
          "init" : {
                   },
          "update": {
            "s": "STATS_ADD(s, value)"
                    },
          "result": "s"
        }
      ]
    }
    ```
    
    * `$METRON_HOME/bin/start_profiler_topology.sh`
    
    ### Test Case
    
    * Set up a profile to accept some synthetic data with a numeric `value` field and persist
a stats summary of the data
    
    * Send some synthetic data directly to the profiler:
    `python ~/rand_gen.py 0 1 1 | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic value`
    * Wait for at least 32 minutes and execute the following via the Stellar REPL:
    ```
    # Grab the profiles from 1 minute ago to 8 minutes ago
    LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('from 1 minute ago to 8 minutes ago')))
    # Looks like 7 were returned, great.  Now try something more complex
    # Grab the profiles in 5 minute windows every 10 minutes from 2 minutes ago to 32 minutes
ago:
    #  32 minutes ago til 27 minutes ago should be 5 profiles
    #  22 minutes ago til 17 minutes ago should be 5 profiles
    #  12 minutes ago til 7 minutes ago should be 5 profiles
    # for a total of 15 profiles
    LENGTH(PROFILE_GET('stat', 'global', PROFILE_WINDOW('5 minute window every 10 minutes
starting from 2 minutes ago until 32 minutes ago excluding holidays:us')))
    ```
    For me, the following was the result:
    ```
    ```
    * Delete any value index that currently exists (if any do) via `curl -XDELETE "http://localhost:9200/value*"`
    * Wait for a couple of seconds and run 
      * `curl "http://localhost:9200/value*/_search?pretty=true&q=*:*" 2> /dev/null`

      * You should see values in the index with non-zero fields:
         * `num_profiles_enrichment` should be 15
         * `num_profiles_parser` should be 15
         * `mean_enrichment` should be a non-zero double
         * `mean_parser` should be a non-zero double
    For reference, a sample message for me is:
    ```
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message