drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] cgivre opened a new pull request #2112: DRILL-7534: Convert HTTPD Format Plugin to EVF
Date Wed, 11 Nov 2020 04:35:09 GMT

cgivre opened a new pull request #2112:
URL: https://github.com/apache/drill/pull/2112


   # [DRILL-7534](https://issues.apache.org/jira/browse/DRILL-7534): Convert HTTPD Format
Plugin to EVF
   
   ## Description
   This PR updates the HTTPD format plugin to use the Enhanced Vector Framework (EVF).  In
theory there are few changes a user might notice.
   
   1. A new configuration option `maxErrors` has been added which will allow a user to tune
how fault tolerant they want Drill to be when reading log files. 
   2.  Two new implicit fields have been added, `_raw` and `_matched`.  They are described
in the docs below. 
   3.  The plugin now includes a limit pushdown which significantly improves query times for
queries with limits.
   4.  The plugin code is now in the `contrib` folder.
   
   In addition, this PR updates the associated User Agent parsing functions with the latest
version of the underlying libraries.
   
   ## Documentation
   # Web Server Log Format Plugin (HTTPD)
   This plugin enables Drill to read and query httpd (Apache Web Server) and nginx logs natively.
This plugin uses the work by [Niels Basjes](https://github.com/nielsbasjes) which is available
here: https://github.com/nielsbasjes/logparser.
   
   ## Configuration
   There are three fields which you will need to configure in order for Drill to read web
server logs which are:
   * **`logFormat`**:  The log format string is the format string found in your web server
configuration.
   * **`timestampFormat`**:  The format of time stamps in your log files.
   * **`extensions`**:  The file extension of your web server logs.
   * **`maxErrors`**:  Sets the plugin error tolerence. When set to any value less than `0`,
Drill will ignore all errors. 
   
   ```json
   "httpd" : {
     "type" : "httpd",
     "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
     "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
     "maxErrors": 0
   }
   ```
   
   ### Implicit Columns
   Data queried by this plugin will return two implicit columns:
   
   * **`_raw`**: This returns the raw, unparsed log line
   * **`_matched`**:  Returns `true` or `false` depending on whether the line matched the
config string.
   
   Thus, if you wanted to see which lines in your log file were not matching the config, you
could use the following query:
   
   ```sql
   SELECT _raw
   FROM <data>
   WHERE _matched = false
   ```
   ## Testing
   Added additional unit tests for this plugin.  Ran all unit tests for the `parse_user_agent()`
UDF as well. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message