drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: [jira] [Created] (DRILL-6628) Possible incorporation of Twitter text processing UDFs into Drill-proper
Date Mon, 23 Jul 2018 15:39:58 GMT
Hi Bob, 
I was inspired a little by OSQuery and MySQL, but I’ve written a lot of UDFs that extend
basic SQL functionality and add other capabilities to Drill. IMHO, since Drill isn’t a database,
it really is a very helpful addition and will get more people using Drill.  I’d personally
be very interested in your Cyber-ish UDFs.  

FYI, there are a collection of Network analysis functions already in Drill:
Networking Functions
Drill supports the following networking functions to facilitate network analysis using Drill:

inet_aton(<ip>): Converts an IPv4 address into an integer
inet_ntoa( <int>): Converts an integer IP into dotted decimal notation
in_network( <ip>,<cidr> ): Returns true if the IP address is in the given CIDR
address_count( <cidr> ): Returns the number of IPs in a given CIDR block
broadcast_address( <cidr> ): Returns the broadcast address for a given CIDR block
netmask(<cidr> ): Returns the netmask for a given CIDR block
low_address(<cidr>): Returns the first address in a given CIDR block
high_address(<cidr>): Returns the last address in a given CIDR block
url_encode( <url> ): Returns a URL encoded string
url_decode( <url> ): Decodes ``a URL encoded string
is_valid_IP(<ip>): Returns true if the IP is a valid IP address
is_private_ip(<ip>): Returns true if the IP is a private IPv4 address
is_valid_IPv4(<ip>): Returns true if the IP is a valid IPv4 address
is_valid_IPv6(<ip>): Returns true if the IP is a valid IPv6 address

I’ve been working on a few other security related hackery including Drill UDFs that do DNS
lookups and Whois data.  Also, I assume you saw that Drill-6104 which is a generic regex/log
format plugin.  I’m working on a syslog/RFC-5424 format plugin for Drill which I intend
to submit for Drill 1.15.  Anyway, my point being IMHO, Drill is a great tool for cyber data
analysis and the more goodness we have officially part of Drill the better things are. 


> On Jul 23, 2018, at 08:44, Bob Rudis (JIRA) <jira@apache.org> wrote:
> Bob Rudis created DRILL-6628:
> --------------------------------
>             Summary: Possible incorporation of Twitter text processing UDFs into Drill-proper
>                 Key: DRILL-6628
>                 URL: https://issues.apache.org/jira/browse/DRILL-6628
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Functions - Drill
>            Reporter: Bob Rudis
> Per the User mailing list thread — [https://mail-archives.apache.org/mod_mbox/drill-user/201807.mbox/%3Caef1979d-f454-4691-8607-8267adf2ac1e%40getmailbird.com%3E]
— submitting the possibility for the inclusion of drill-twitter-text — [https://github.com/hrbrmstr/drill-twitter-text]
— into Drill-proper.
> Shifting the conversation here since it's more appropriate and CC'ing [~cgivre] who posited
the idea.
> On the one hand, there are function groups such as "Phonetic" and "String Distance" so
there's precedent for inclusion of "non-boring-SQL"-like functions into Drill-proper. On the
other hand, this is a small addition of a handful of functions for Twitter text so would this
be to niche for a "Twitter"  function group?
> As noted in the mailing list thread, there are more "cyber"-ish UDFs on the way (still
kinda hoping for that guava upgrade that I saw mentioned in various places in jira), so would
the Twitter components be in a "Cyber" group?
> Regardless, I'll take a look at how the functions are structured in the Drill source
tree and gladly machinate the necessary changes/inclusions if the result of this discussion
results in that decision.
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message