metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (METRON-204) Field Transformation Domain Specific Language
Date Fri, 03 Jun 2016 02:10:59 GMT

    [ https://issues.apache.org/jira/browse/METRON-204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15313478#comment-15313478
] 

ASF GitHub Bot commented on METRON-204:
---------------------------------------

GitHub user cestella opened a pull request:

    https://github.com/apache/incubator-metron/pull/142

    METRON-204: Field Transformation Domain Specific Language

    Similar to the domain specific query language, it would be nice to have a domain specific
language for transformations which is used as an optional FieldTransformation implementation.
    
    * A fixed set of transformation functions:
       * `TO_LOWER(string)` : Transforms the first argument to a lowercase string
       * `TO_UPPER(string)` : Transforms the first argument to an uppercase string
       * `TO_STRING(string)` : Transforms the first argument to a string
       * `TO_INTEGER(x)` : Transforms the first argument to an integer
       * `TO_DOUBLE(x)` : Transforms the first argument to a double
       * `TRIM(string)` : Trims whitespace from both sides of a string.
       * `JOIN(list, delim)` : Joins the components of the list with the specified delimiter
       * `SPLIT(string, delim)` : Splits the string by the delimiter.  Returns a list.
       * `GET_FIRST(list)` : Returns the first element of the list
       * `GET_LAST(list)` : Returns the last element of the list
       * `GET(list, i)` : Returns the i'th element of the list (i is 0-based).
       * `MAP_GET(key, map, default)` : Returns the value associated with the key in the map.
 If the key does not exist, the default will   be returned.  If the default is unspecified,
then null will be returned.
       * `DOMAIN_TO_TLD(domain)` : Returns the TLD of the domain.
       * `DOMAIN_REMOVE_TLD(domain)` : Remove the TLD of the domain.
       * `REMOVE_TLD(domain)` : Removes the TLD from the domain.
       * `URL_TO_HOST(url)` : Returns the host from a URL
       * `URL_TO_PROTOCOL(url)` : Returns the protocol from a URL
       * `URL_TO_PORT(url)` : Returns the port from a URL
       * `URL_TO_PATH(url)` : Returns the path from a URL
       * `TO_EPOCH_TIMESTAMP(dateTime, format, timezone)` : Returns the epoch timestamp of
the `dateTime` given the `format`.  If the format does not have a timestamp and you wish to
assume a given timestamp, you may specify the `timezone` optionally.
    * A FieldTransformer implementation, `MTL` which exposes the transformation language
    
    Example MTL transformation:
    
    Consider the following sensor parser config to add three new fields to a
    message:
    * `utc_timestamp` : The unix epoch timestamp based on the `timestamp` field, a `dc` field
which is the data center the message comes     from and a `dc2tz` map mapping data centers
to timezones
    * `url_host` : The host associated with the url in the `url` field
    * `url_protocol` : The protocol associated with the url in the `url` field
    
    ```
    {
    ...
        "fieldTransformations" : [
              {
               "transformation" : "MTL"
              ,"output" : [ "utc_timestamp", "url_host", "url_protocol" ]
              ,"config" : {
                "utc_timestamp" : "TO_EPOCH_TIMESTAMP(timestamp, 'yyyy-MM-dd
    HH:mm:ss', MAP_GET(dc, dc2tz, 'UTC') )"
               ,"url_host" : "URL_TO_HOST(url)"
               ,"url_protocol" : "URL_TO_PROTOCOL(url)"
                          }
              }
                          ]
       ,"parserConfig" : {
          "dc2tz" : {
                    "nyc" : "EST"
                   ,"la" : "PST"
                   ,"london" : "UTC"
                    }
        }
    }
    ```
    
    Note that the `dc2tz` map is in the parser config, so it is accessible
    in the functions.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cestella/incubator-metron METRON-204

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-metron/pull/142.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #142
    
----
commit cb9e925a199b3ee4b377f955066a959a91fc87c2
Author: cstella <cestella@gmail.com>
Date:   2016-06-03T02:05:33Z

    METRON-204: Field Transformation Domain Specific Language

----


> Field Transformation Domain Specific Language
> ---------------------------------------------
>
>                 Key: METRON-204
>                 URL: https://issues.apache.org/jira/browse/METRON-204
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Casey Stella
>
> Similar to the domain specific query language, it would be nice to have a domain specific
language for transformations which is used as an optional FieldTransformation implementation.
> * A fixed set of transformation functions:
>    * `TO_LOWER(string)` : Transforms the first argument to a lowercase string
>    * `TO_UPPER(string)` : Transforms the first argument to an uppercase string
>    * `TO_STRING(string)` : Transforms the first argument to a string
>    * `TO_INTEGER(x)` : Transforms the first argument to an integer
>    * `TO_DOUBLE(x)` : Transforms the first argument to a double
>    * `TRIM(string)` : Trims whitespace from both sides of a string.
>    * `JOIN(list, delim)` : Joins the components of the list with the specified delimiter
>    * `SPLIT(string, delim)` : Splits the string by the delimiter.  Returns a list.
>    * `GET_FIRST(list)` : Returns the first element of the list
>    * `GET_LAST(list)` : Returns the last element of the list
>    * `GET(list, i)` : Returns the i'th element of the list (i is 0-based).
>    * `MAP_GET(key, map, default)` : Returns the value associated with the key in the
map.  If the key does not exist, the default will   be returned.  If the default is unspecified,
then null will be returned.
>    * `DOMAIN_TO_TLD(domain)` : Returns the TLD of the domain.
>    * `DOMAIN_REMOVE_TLD(domain)` : Remove the TLD of the domain.
>    * `REMOVE_TLD(domain)` : Removes the TLD from the domain.
>    * `URL_TO_HOST(url)` : Returns the host from a URL
>    * `URL_TO_PROTOCOL(url)` : Returns the protocol from a URL
>    * `URL_TO_PORT(url)` : Returns the port from a URL
>    * `URL_TO_PATH(url)` : Returns the path from a URL
>    * `TO_EPOCH_TIMESTAMP(dateTime, format, timezone)` : Returns the epoch timestamp of
the `dateTime` given the `format`.  If the format does not have a timestamp and you wish to
assume a given timestamp, you may specify the `timezone` optionally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message