metron-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (METRON-1795) General Purpose Regex Parser
Date Thu, 04 Oct 2018 16:06:00 GMT


ASF GitHub Bot commented on METRON-1795:

Github user mmiklavc commented on the issue:
    @jagdeepsingh2 - can you review your recent commits? It looks like there's a bad merge
somewhere considering the jump to 6k+ lines in the diff.

> General Purpose Regex Parser
> ----------------------------
>                 Key: METRON-1795
>                 URL:
>             Project: Metron
>          Issue Type: New Feature
>            Reporter: Jagdeep Singh
>            Priority: Minor
> We have implemented a general purpose regex parser for Metron that we are interested
in contributing back to the community.
> While the Metron Grok parser provides some regex based capability today, the intention
of this general purpose regex parser is to:
>  # Allow for more advanced parsing scenarios (specifically, dealing with multiple regex
lines for devices that contain several log formats within them)
>  # Give users and developers of Metron additional options for parsing
>  # With the new parser chaining and regex routing feature available in Metron, this gives
some additional flexibility to logically separate a flow by:
>  # Regex routing to segregate logs at a device level and handle envelope unwrapping
>  # This general purpose regex parser to parse an entire device type that contains multiple
log formats within the single device (for example, RHEL logs)
> At the high-level control flow is like this:
>  # Identify the record type if incoming raw message.
>  # Find and apply the regular expression of corresponding record type to extract the
fields (using named groups). 
>  # Apply the message header regex to extract the fields in the header part of the message (using
named groups).
> The parser config uses the following structure:
> {code:java}
> "recordTypeRegex": "(?<process>(?<=\\s)\\b(kernel|syslog)\\b(?=\\[|:))"  
>  "messageHeaderRegex": "(?<syslogpriority>(?<=^<)\\d{1,4}(?=>)).*?(?<timestamp>(?<=>)[A-Za-z]{3}\\s{1,2}\\d{1,2}\\s\\d{1,2}:\\d{1,2}:\\d{1,2}(?=\\s)).*?(?<syslogHost>(?<=\\s).*?(?=\\s))",
>    "fields": [
>       {
>         "recordType": "kernel",
>         "regex": ".*(?<eventInfo>(?<=\\]|\\w\\:).*?(?=$))"
>       },
>       {
>         "recordType": "syslog",
>         "regex": ".*(?<processid>(?<=PID\\s=\\s).*?(?=\\sLine)).*(?<filePath>(?<=64\\s)\/([A-Za-z0-9_-]+\/)+(?=\\w))(?<fileName>.*?(?=\")).*(?<eventInfo>(?<=\").*?(?=$))"
>       }
> ]
> {code}
> Where:
>  * *recordTypeRegex* is used to distinctly identify a record type. It inputs a valid
regular expression and may also have named groups, which would be extracted into fields.
>  * *messageHeaderRegex* is used to specify a regular expression to extract fields from
a message part which is common across all the messages (i.e, syslog fields, standard headers)
>  * *fields*: json list of objects containing recordType and regex. The expression that
is evaluated is based on the output of the recordTypeRegex
>  * Note: *recordTypeRegex* and *messageHeaderRegex* could be specified as lists also
(as a JSON array), where the list will be evaluated in order until a matching regular expression
is found.

This message was sent by Atlassian JIRA

View raw message