drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] nielsbasjes commented on pull request #2112: DRILL-7534: Convert HTTPD Format Plugin to EVF
Date Mon, 16 Nov 2020 10:55:09 GMT

nielsbasjes commented on pull request #2112:
URL: https://github.com/apache/drill/pull/2112#issuecomment-727902776


   As an experiment I added this to your code:
   ```
   diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml
   index 10a9e35b4..02ae984ac 100644
   --- a/contrib/format-httpd/pom.xml
   +++ b/contrib/format-httpd/pom.xml
   @@ -51,6 +51,12 @@
          </exclusions>
        </dependency>
    
   +    <dependency>
   +      <groupId>nl.basjes.parse.useragent</groupId>
   +      <artifactId>yauaa-logparser</artifactId>
   +      <version>${yauaa.version}</version>
   +    </dependency>
   +
        <!-- Test dependencies -->
        <dependency>
          <groupId>org.apache.drill.exec</groupId>
   diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   index 326a074d1..8a0f23063 100644
   --- a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
   @@ -17,6 +17,7 @@
     */
    package org.apache.drill.exec.store.httpd;
    
   +import nl.basjes.parse.useragent.dissector.UserAgentDissector;
    import org.apache.drill.common.expression.SchemaPath;
    import org.apache.drill.common.types.TypeProtos;
    import org.apache.drill.common.types.TypeProtos.MinorType;
   @@ -67,6 +68,7 @@ public class HttpdParser {
        } else {
          this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, timestampFormat);
        }
   +    this.parser.addDissector(new UserAgentDissector());
        this.requestedColumns = scan.getColumns();
    
        if (timestampFormat != null && !timestampFormat.trim().isEmpty()) {
   @@ -119,6 +121,7 @@ public class HttpdParser {
         * because this will be the slowest parsing path possible for the specified format.
         */
        Parser<Object> dummy = new HttpdLoglineParser<>(Object.class, logFormat);
   +    dummy.addDissector(new UserAgentDissector());
        dummy.addParseTarget(String.class.getMethod("indexOf", String.class), allParserPaths);
    
        for (final Map.Entry<String, String> entry : requestedPaths.entrySet()) {
   ```
   
   Now I ran into something strange.
   When I run this test code:
   ```
       String sql = "SELECT `request_user-agent`, `request_user-agent_device__name`, `request_user-agent_agent__name__version__major`
FROM cp.`httpd/hackers-access-small.httpd` LIMIT 1";
       RowSet results = client.queryBuilder().sql(sql).rowSet();
       results.print();
   ```
   
   I see this
   ```
   #: `request_user-agent` VARCHAR, `request_user-agent_device__name` VARCHAR, `request_user-agent_agent__name__version__major`
VARCHAR
   0: "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", "DesktopDesktop",
"Firefox 35Firefox 35"
   ```
   
   At this moment I think this is a bug in the Yauaa Dissector.
   I'm digging into this.
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message