drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: REGEX search Operator
Date Thu, 04 Feb 2016 18:15:34 GMT
Tip for navigating large Github repos. You can type 't' when looking at the
folder structure to open a fast global search. Searching for the functions
is a little extra-complicated in Drill because we actually generate a bunch
of them to cover all of the types. This means that source code templates,
not pure java source code are where you will find them in source control.

Most of the functions in Drill are in the exec.expr.fn.impl package. Here
is an example of functions that are not generated, you could add the
function to this class or make a new class in the same package [1]

[1] -
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/fn/impl/StringFunctions.java

On Thu, Feb 4, 2016 at 10:03 AM, Nicolas Paris <niparisco@gmail.com> wrote:

> John, Jason,
>
> 2016-02-04 18:47 GMT+01:00 John Omernik <john@omernik.com>:
>
> > I'd be curios on how you are implemeting the regex... using Java's regex
> > libraries? etc.
> >
> ​Yeah, I use
> java.util.regex
> ​
>
>
> > I know one thing with Hive that always bothered me was the need to double
> > escape things.
> >
> > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we can
> > avoid that it would be AWESOME.
> >
> ​My guess is this comes from java way to handle strings. All langages I
> have used need to double escape.​
>
>
> > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
>
> ​code is here: https://github.com/parisni/drill-simple-contains
> It's disturbing how it is simple...
> ​
>
>
> > > I think you should actually just put the function in
> > ​​
> > Drill itself. System
> > > native functions are implemented in the same interface as UDFs, because
> > our
> > > mechanism for evaluating them is very efficient (we code generate code
> > > blocks by linking together the bodies of the individual functions to
> > > evaluate a complete expression).
> >
> ​well the folder tree is quite impressive (https://github.com/apache/drill
> ).
> ​
>
> ​what folder is supposed to be "
> ​
> Drill itself"
> ​ ?​
> ​
>
> > > You can open a JIRA, marking it a feature request. You can open a poll
> > > request against the apache github repo, making sure you follow the
> > standard
> > > format for your commit message, prefixing with the JIRA number in the
> > > format
> > > Example:
> > > DRILL-XXXX: Feature description
> > >
> > > This will automatically link the PR to your JIRA.
> >
> ​Ok I will try thanks​
>
> ​a lot​
>
> > > - Jason
> > >
> > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <niparisco@gmail.com>
> > wrote:
> > >
> > > > Jason, I have it working,
> > > >
> > > > Just tell me the way to proceed to PR.
> > > > 1. where do I put my maven project ? Witch folder in my drill github
> > > fork?
> > > > 2. do I need a jira ? how proceed ?
> > > >
> > > > For now, I only published it on my github account in a separate
> project
> > > >
> > > > Thanks
> > > >
> > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <altekrusejason@gmail.com
> >:
> > > >
> > > > > Awesome, thanks!
> > > > >
> > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Well I am creating a udf
> > > > > > good exercise
> > > > > > I hope a PR soon
> > > > > >
> > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > altekrusejason@gmail.com
> > > >:
> > > > > >
> > > > > > > I didn't realize that we were lacking this functionality.
As
> the
> > > > > > > repeated_contains operator handles wildcards it makes sense
to
> > add
> > > > > such a
> > > > > > > function to drill.
> > > > > > >
> > > > > > > It should be simple to implement, would someone like to
open a
> > JIRA
> > > > and
> > > > > > > submit a PR for this?
> > > > > > >
> > > > > > > - Jason
> > > > > > >
> > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik <john@omernik.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > I would like to see something like this as well, even
if it's
> > an
> > > > > > included
> > > > > > > > UDF like REGEX(field, pattern) using Java's library
for regex
> > > like
> > > > > Hive
> > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > > ANSI SQL doesn't define regex operator.
> > > > > > > > > > Drill neither.
> > > > > > > > > >
> > > > > > > > > ​Drill has SQL functions extension like
> "REPEATED_CONTAINS"​
> > > that
> > > > > > looks
> > > > > > > > to
> > > > > > > > > handle regex. regex operator could be replaced
with one new
> > SQL
> > > > > > > > extension ?
> > > > > > > > > I guess I could create my own functions in java,
right ?
> > Maybe
> > > > push
> > > > > > it
> > > > > > > > into
> > > > > > > > > github then ?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > ​Sadly not, I'am looking for complex pattern
matching. ​
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > > Miura, Masahide
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM
> > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > >
> > > > > > > > > > Hello,
> > > > > > > > > >
> > > > > > > > > > I can't find any reference in the documentation
about a
> > regex
> > > > > > > operator.
> > > > > > > > > >
> > > > > > > > > > I would like to be able to query this way
:
> > > > > > > > > >
> > > > > > > > > > SELECT *
> > > > > > > > > > FROM xxx
> > > > > > > > > > WHERE  text_field   regexOperator    'regex_pattern';
> > > > > > > > > >
> > > > > > > > > > Thanks for helping,
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message