cgivre opened a new pull request #2112:
URL: https://github.com/apache/drill/pull/2112
# [DRILL-7534](https://issues.apache.org/jira/browse/DRILL-7534): Convert HTTPD Format
Plugin to EVF
## Description
This PR updates the HTTPD format plugin to use the Enhanced Vector Framework (EVF). In
theory there are few changes a user might notice.
1. A new configuration option `maxErrors` has been added which will allow a user to tune
how fault tolerant they want Drill to be when reading log files.
2. Two new implicit fields have been added, `_raw` and `_matched`. They are described
in the docs below.
3. The plugin now includes a limit pushdown which significantly improves query times for
queries with limits.
4. The plugin code is now in the `contrib` folder.
In addition, this PR updates the associated User Agent parsing functions with the latest
version of the underlying libraries.
## Documentation
# Web Server Log Format Plugin (HTTPD)
This plugin enables Drill to read and query httpd (Apache Web Server) and nginx logs natively.
This plugin uses the work by [Niels Basjes](https://github.com/nielsbasjes) which is available
here: https://github.com/nielsbasjes/logparser.
## Configuration
There are three fields which you will need to configure in order for Drill to read web
server logs which are:
* **`logFormat`**: The log format string is the format string found in your web server
configuration.
* **`timestampFormat`**: The format of time stamps in your log files.
* **`extensions`**: The file extension of your web server logs.
* **`maxErrors`**: Sets the plugin error tolerence. When set to any value less than `0`,
Drill will ignore all errors.
```json
"httpd" : {
"type" : "httpd",
"logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
"timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
"maxErrors": 0
}
```
### Implicit Columns
Data queried by this plugin will return two implicit columns:
* **`_raw`**: This returns the raw, unparsed log line
* **`_matched`**: Returns `true` or `false` depending on whether the line matched the
config string.
Thus, if you wanted to see which lines in your log file were not matching the config, you
could use the following query:
```sql
SELECT _raw
FROM <data>
WHERE _matched = false
```
## Testing
Added additional unit tests for this plugin. Ran all unit tests for the `parse_user_agent()`
UDF as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
|