metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <...@git.apache.org>
Subject [GitHub] metron issue #621: METRON-1001: Allow metron to ingest parser metadata along...
Date Fri, 23 Jun 2017 01:03:41 GMT
Github user cestella commented on the issue:

    https://github.com/apache/metron/pull/621
  
    # TESTING PLAN
    
    Testing Instructions beyond the normal smoke test (i.e. letting data
    flow through to the indices and checking them).
    
    # Preliminaries
    
    Since I will use the squid topology to pass data through in a controlled
    way, we must install squid and generate one point of data:
    * `yum install -y squid`
    * `service squid start`
    * `squidclient http://www.yahoo.com`
    
    Also, set an environment variable to indicate `METRON_HOME`:
    * `export METRON_HOME=/usr/metron/0.4.0` 
    
    # Deploy the squid parser
    * Create the squid kafka topic: `/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --zookeeper
node1:2181 --create --topic squid --partitions 1 --replication-factor 1`
    * Start via `$METRON_HOME/bin/start_parser_topology.sh -k node1:6667 -z node1:2181 -s
squid`
    
    # Test Cases
    
    ## Test Case 1: Base Case
    * Send squid data through: `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic squid`
    * Validate that the message goes through with no fields prefixed with `metadata`: `curl
-XPOST 'http://localhost:9200/squid*/_search?pretty'`
    
    ## Test Case 2: Validate Environmental Metadata is available
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", "kafka_topic" ]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper
-z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic squid`
    * Validate that the message goes through with a `kafka_topic` field of `SQUID`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic" ]
    }
    '
    ```
    ## Test Case 3: Validate Environmental Metadata is available and is able to be merged
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and merging:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", "kafka_topic" ]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper
-z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `cat /var/log/squid/access.log | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic squid`
    * Validate that the message goes through with a `kafka_topic` field of `SQUID` and `metron:metadata:topic`
of `squid`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "metron:metadata:topic" ]
    }
    '
    ```
    ## Test Case 3: Validate Custom Metadata is available 
    We're going to send a custom JSON Map containing metadata along in the key.  The map will
have one value `customer_id`
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and turn off merging.  Also, emit a new field called `customer_id`
in the field transformation:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : false,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", "kafka_topic", "customer_id"]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
          ,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper
-z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `IFS=$'\n';for i in $(cat /var/log/squid/access.log);do METADATA="{\"customer_id\"
: \"cust2\"}"; echo $METADATA\;$i;done | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic squid --property="parse.key=true" --property "key.separator=;"`

    * Validate that the message goes through with a `kafka_topic` field of `SQUID` and `customer_id`
of `CUST2`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "customer_id" ]
    }
    '
    ```
    ## Test Case 4: Validate Custom Metadata is available and able to be merged
    We're going to send a custom JSON Map containing metadata along in the key.  The map will
have one value `customer_id`
    * Modify `$METRON_HOME/config/zookeeper/parsers/squid.json` and turn on
      metadata reading and turn back on merging.  Also, emit a custom metadata field called
`customer_id` in the field transformation:
    ```
    {
      "parserClassName": "org.apache.metron.parsers.GrokParser",
      "sensorTopic": "squid",
      "readMetadata" : true,
      "mergeMetadata" : true,
      "parserConfig": {
        "grokPath": "/patterns/squid",
        "patternLabel": "SQUID_DELIMITED",
        "timestampField": "timestamp"
      },
      "fieldTransformations" : [
        {
          "transformation" : "STELLAR"
        ,"output" : [ "full_hostname", "domain_without_subdomains", "kafka_topic", "customer_id"]
        ,"config" : {
          "full_hostname" : "URL_TO_HOST(url)"
          ,"domain_without_subdomains" : "DOMAIN_REMOVE_SUBDOMAINS(full_hostname)"
          ,"kafka_topic" : "TO_UPPER(metron.metadata.topic)"
          ,"customer_id" : "TO_UPPER(metron.metadata.customer_id)"
                    }
        }
                               ]
    }
    ```
    * Persist config changes: `$METRON_HOME/bin/zk_load_configs.sh -m PUSH -i $METRON_HOME/config/zookeeper
-z node1:2181`
    * Clear indices: `curl -XDELETE "http://localhost:9200/squid*"`
    * Send squid data through: `IFS=$'\n';for i in $(cat /var/log/squid/access.log);do METADATA="{\"customer_id\"
: \"cust2\"}"; echo $METADATA\;$i;done | /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh
--broker-list node1:6667 --topic squid --property="parse.key=true" --property "key.separator=;"`

    * Validate that the message goes through with a `kafka_topic` field of `SQUID` `metron:metadata:customer_id`
of `cust2` and `customer_id` of `CUST2`: 
    ```
    curl -XPOST 'http://localhost:9200/squid*/_search?pretty' -d '
    {
      "_source" : [ "kafka_topic", "customer_id", "metron:metadata:customer_id" ]
    }
    '
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message