drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: REGEX search Operator
Date Thu, 04 Feb 2016 19:41:21 GMT
Ya, do you see where I am coming from here? Let's let the users submit
regex in the pure form if possible, and code the nuances of java regex
behind the scenes. I think it would be a great way to make Drill very
accessible and desirable.  I think what happened in Hive is the regex
commands started with the users having the escape and now there are just to
many things that using the escaped regex and the project doesn't want to
adjust.




On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <niparisco@gmail.com> wrote:

> You mean:
> userRegex=>javaRegex
> "\d" => "\\d"
> "\w" => "\\w"
> "\n" => "\n"
> I can do that thanks to regex I guess.
> I will give a try
>
>
> 2016-02-04 19:37 GMT+01:00 John Omernik <john@omernik.com>:
>
> > So my question on the double escape, is there no way to handle that so
> the
> > user can use single escaped regex? I know many folks who use big data
> > platform to test large complex regexes for things like security
> appliances,
> > and having to convert the regex seems like a lot of work if you consider
> > every user has to do that.  If there was a way to do it in Drill, that
> > would save countless people hours and save many mistakes.
> >
> > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <niparisco@gmail.com>
> > wrote:
> >
> > > John, Jason,
> > >
> > > 2016-02-04 18:47 GMT+01:00 John Omernik <john@omernik.com>:
> > >
> > > > I'd be curios on how you are implemeting the regex... using Java's
> > regex
> > > > libraries? etc.
> > > >
> > > ​Yeah, I use
> > > java.util.regex
> > > ​
> > >
> > >
> > > > I know one thing with Hive that always bothered me was the need to
> > double
> > > > escape things.
> > > >
> > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d' of we
> > can
> > > > avoid that it would be AWESOME.
> > > >
> > > ​My guess is this comes from java way to handle strings. All langages I
> > > have used need to double escape.​
> > >
> > >
> > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > altekrusejason@gmail.com
> > > > >
> > > > wrote:
> > >
> > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > It's disturbing how it is simple...
> > > ​
> > >
> > >
> > > > > I think you should actually just put the function in
> > > > ​​
> > > > Drill itself. System
> > > > > native functions are implemented in the same interface as UDFs,
> > because
> > > > our
> > > > > mechanism for evaluating them is very efficient (we code generate
> > code
> > > > > blocks by linking together the bodies of the individual functions
> to
> > > > > evaluate a complete expression).
> > > >
> > > ​well the folder tree is quite impressive (
> > https://github.com/apache/drill
> > > ).
> > > ​
> > >
> > > ​what folder is supposed to be "
> > > ​
> > > Drill itself"
> > > ​ ?​
> > > ​
> > >
> > > > > You can open a JIRA, marking it a feature request. You can open a
> > poll
> > > > > request against the apache github repo, making sure you follow the
> > > > standard
> > > > > format for your commit message, prefixing with the JIRA number in
> the
> > > > > format
> > > > > Example:
> > > > > DRILL-XXXX: Feature description
> > > > >
> > > > > This will automatically link the PR to your JIRA.
> > > >
> > > ​Ok I will try thanks​
> > >
> > > ​a lot​
> > >
> > > > > - Jason
> > > > >
> > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <niparisco@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Jason, I have it working,
> > > > > >
> > > > > > Just tell me the way to proceed to PR.
> > > > > > 1. where do I put my maven project ? Witch folder in my drill
> > github
> > > > > fork?
> > > > > > 2. do I need a jira ? how proceed ?
> > > > > >
> > > > > > For now, I only published it on my github account in a separate
> > > project
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > altekrusejason@gmail.com
> > > >:
> > > > > >
> > > > > > > Awesome, thanks!
> > > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris <
> > niparisco@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Well I am creating a udf
> > > > > > > > good exercise
> > > > > > > > I hope a PR soon
> > > > > > > >
> > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse <
> > > > altekrusejason@gmail.com
> > > > > >:
> > > > > > > >
> > > > > > > > > I didn't realize that we were lacking this functionality.
> As
> > > the
> > > > > > > > > repeated_contains operator handles wildcards
it makes sense
> > to
> > > > add
> > > > > > > such a
> > > > > > > > > function to drill.
> > > > > > > > >
> > > > > > > > > It should be simple to implement, would someone
like to
> open
> > a
> > > > JIRA
> > > > > > and
> > > > > > > > > submit a PR for this?
> > > > > > > > >
> > > > > > > > > - Jason
> > > > > > > > >
> > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik
<
> > john@omernik.com
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I would like to see something like this
as well, even if
> > it's
> > > > an
> > > > > > > > included
> > > > > > > > > > UDF like REGEX(field, pattern) using Java's
library for
> > regex
> > > > > like
> > > > > > > Hive
> > > > > > > > > > does.  That would be EXTREMELY helpful.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas
Paris <
> > > > > niparisco@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > > ANSI SQL doesn't define regex
operator.
> > > > > > > > > > > > Drill neither.
> > > > > > > > > > > >
> > > > > > > > > > > ​Drill has SQL functions extension
like
> > > "REPEATED_CONTAINS"​
> > > > > that
> > > > > > > > looks
> > > > > > > > > > to
> > > > > > > > > > > handle regex. regex operator could
be replaced with one
> > new
> > > > SQL
> > > > > > > > > > extension ?
> > > > > > > > > > > I guess I could create my own functions
in java, right
> ?
> > > > Maybe
> > > > > > push
> > > > > > > > it
> > > > > > > > > > into
> > > > > > > > > > > github then ?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Doesn't it enough 'LIKE' operator?
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ​Sadly not, I'am looking for complex
pattern matching.
> ​
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > >
> > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > From: Nicolas Paris [mailto:niparisco@gmail.com]
> > > > > > > > > > > > Sent: Tuesday, February 02, 2016
9:04 PM
> > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > Subject: REGEX search Operator
> > > > > > > > > > > >
> > > > > > > > > > > > Hello,
> > > > > > > > > > > >
> > > > > > > > > > > > I can't find any reference in
the documentation
> about a
> > > > regex
> > > > > > > > > operator.
> > > > > > > > > > > >
> > > > > > > > > > > > I would like to be able to query
this way :
> > > > > > > > > > > >
> > > > > > > > > > > > SELECT *
> > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > WHERE  text_field   regexOperator
   'regex_pattern';
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message