metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cestella <...@git.apache.org>
Subject [GitHub] incubator-metron issue #517: METRON-831: Add lambda expressions and rudiment...
Date Fri, 07 Apr 2017 18:20:06 GMT
Github user cestella commented on the issue:

    https://github.com/apache/incubator-metron/pull/517
  
    I spun this up in full-dev and did the following to test:
    * Download the Alexa top 1m data set
    ```
    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    ```
    * Stage import file
    ```
    head -n 10000 top-1m.csv > top-10k.csv
    head -n 10 top-1m.csv > top-10.csv
    hadoop fs -put top-10k.csv /tmp
    ```
    * Create an extractor.json for the CSV data by editing `extractor.json` and pasting in
these contents:
    ```
    {
      "config" : {
        "zk_quorum" : "node1:2181",
        "columns" : {
           "rank" : 0,
           "domain" : 1
        },
        "value_transform" : {
           "domain" : "DOMAIN_REMOVE_TLD(domain)",
           "port" : "es.port"
        },
        "value_filter" : "LENGTH(domain) > 0",
        "indicator_column" : "domain",
        "indicator_transform" : {
           "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
        },
        "indicator_filter" : "LENGTH(indicator) > 0",
        "type" : "top_domains",
        "separator" : ","
      },
      "extractor" : "CSV"
    }
    ```
    * Import enriched data
    `echo "truncate 'enrichment'" | hbase shell && $METRON_HOME/bin/flatfile_loader.sh
-i ./top-10k.csv -t enrichment -c t -e ./extractor.json -p 5 -b 128 && echo "count
'enrichment'" | hbase shell`
    * Open up the stellar shell via `$METRON_HOME/bin/stellar -z node1` and execute the following:
    ```
    MAP(['google', 'pdf2doc', 'yahoo'], indicator -> MAP_GET('domain', ENRICHMENT_GET('top_domains',
indicator, 'enrichment', 't')) )
    ```
    You should see `[google, pdf2doc, yahoo]` returned.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message