drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicolas Paris <nipari...@gmail.com>
Subject Re: REGEX search Operator
Date Tue, 09 Feb 2016 15:34:18 GMT
Hi John,

They are actualy two jars to put in the folder (generated in /target). Have
you restarted drill after ?





2016-02-09 16:20 GMT+01:00 John Omernik <john@omernik.com>:

> Nicolas, not really sure what's happening here. it compiled fine, but when
> I run it I get this error. The jar is distributed to my bits, I validated
> that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do I
> need to do something more than that?
>
>
>
> select count(1) from view_myview where srcday = '2016-02-05' and
> contains(domain_name, 'com');
> Error: SYSTEM ERROR: IllegalArgumentException: resource
> /org/apache/drill/contrib/function/SimpleContains.java relative to
> org.apache.drill.contrib.function.SimpleContains not found.
>
> Fragment 1:44
>
> [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010]
>
>   (java.lang.IllegalArgumentException) resource
> /org/apache/drill/contrib/function/SimpleContains.java relative to
> org.apache.drill.contrib.function.SimpleContains not found.
>     com.google.common.base.Preconditions.checkArgument():119
>     com.google.common.io.Resources.getResource():203
>     org.apache.drill.exec.expr.fn.FunctionInitializer.get():127
>     org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99
>     org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81
>     org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94
>     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50
>     org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796
>     org.apache.drill.common.expression.FunctionHolderExpression.accept():47
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796
>     org.apache.drill.common.expression.BooleanOperator.accept():36
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551
>
> org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344
>
>
> org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027
>
> org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796
>
>
> org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56
>     org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105
>     org.apache.drill.exec.expr.ClassGenerator.addExpr():227
>
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187
>
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>
>
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100
>     org.apache.drill.exec.record.AbstractRecordBatch.next():142
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>
>
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1595
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
>
> On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris <niparisco@gmail.com> wrote:
>
> > John,
> >
> > Sorry for that, this already work as expected.
> > Give it a try, this is so easy to deploy
> >
> > SELECT first_name FROM cp.`employee.json` WHERE
> contains(first_name,'\w+')
> > LIMIT 5;
> > first_name |
> > -----------|
> > Sheri      |
> > Derrick    |
> > Michael    |
> > Maya       |
> > Roberta    |
> >
> >
> > 2016-02-04 20:41 GMT+01:00 John Omernik <john@omernik.com>:
> >
> > > Ya, do you see where I am coming from here? Let's let the users submit
> > > regex in the pure form if possible, and code the nuances of java regex
> > > behind the scenes. I think it would be a great way to make Drill very
> > > accessible and desirable.  I think what happened in Hive is the regex
> > > commands started with the users having the escape and now there are
> just
> > to
> > > many things that using the escaped regex and the project doesn't want
> to
> > > adjust.
> > >
> > >
> > >
> > >
> > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris <niparisco@gmail.com>
> > wrote:
> > >
> > > > You mean:
> > > > userRegex=>javaRegex
> > > > "\d" => "\\d"
> > > > "\w" => "\\w"
> > > > "\n" => "\n"
> > > > I can do that thanks to regex I guess.
> > > > I will give a try
> > > >
> > > >
> > > > 2016-02-04 19:37 GMT+01:00 John Omernik <john@omernik.com>:
> > > >
> > > > > So my question on the double escape, is there no way to handle that
> > so
> > > > the
> > > > > user can use single escaped regex? I know many folks who use big
> data
> > > > > platform to test large complex regexes for things like security
> > > > appliances,
> > > > > and having to convert the regex seems like a lot of work if you
> > > consider
> > > > > every user has to do that.  If there was a way to do it in Drill,
> > that
> > > > > would save countless people hours and save many mistakes.
> > > > >
> > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris <
> niparisco@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > John, Jason,
> > > > > >
> > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik <john@omernik.com>:
> > > > > >
> > > > > > > I'd be curios on how you are implemeting the regex... using
> > Java's
> > > > > regex
> > > > > > > libraries? etc.
> > > > > > >
> > > > > > ​Yeah, I use
> > > > > > java.util.regex
> > > > > > ​
> > > > > >
> > > > > >
> > > > > > > I know one thing with Hive that always bothered me was
the need
> > to
> > > > > double
> > > > > > > escape things.
> > > > > > >
> > > > > > > '\d\d\d\d-\d\d-\d\d'  needed to be '\\d\\d\\d\\d-\\d\\d-\\d\\d'
> > of
> > > we
> > > > > can
> > > > > > > avoid that it would be AWESOME.
> > > > > > >
> > > > > > ​My guess is this comes from java way to handle strings. All
> > > langages I
> > > > > > have used need to double escape.​
> > > > > >
> > > > > >
> > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse <
> > > > > > altekrusejason@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > >
> > > > > > ​code is here: https://github.com/parisni/drill-simple-contains
> > > > > > It's disturbing how it is simple...
> > > > > > ​
> > > > > >
> > > > > >
> > > > > > > > I think you should actually just put the function
in
> > > > > > > ​​
> > > > > > > Drill itself. System
> > > > > > > > native functions are implemented in the same interface
as
> UDFs,
> > > > > because
> > > > > > > our
> > > > > > > > mechanism for evaluating them is very efficient (we
code
> > generate
> > > > > code
> > > > > > > > blocks by linking together the bodies of the individual
> > functions
> > > > to
> > > > > > > > evaluate a complete expression).
> > > > > > >
> > > > > > ​well the folder tree is quite impressive (
> > > > > https://github.com/apache/drill
> > > > > > ).
> > > > > > ​
> > > > > >
> > > > > > ​what folder is supposed to be "
> > > > > > ​
> > > > > > Drill itself"
> > > > > > ​ ?​
> > > > > > ​
> > > > > >
> > > > > > > > You can open a JIRA, marking it a feature request.
You can
> > open a
> > > > > poll
> > > > > > > > request against the apache github repo, making sure
you
> follow
> > > the
> > > > > > > standard
> > > > > > > > format for your commit message, prefixing with the
JIRA
> number
> > in
> > > > the
> > > > > > > > format
> > > > > > > > Example:
> > > > > > > > DRILL-XXXX: Feature description
> > > > > > > >
> > > > > > > > This will automatically link the PR to your JIRA.
> > > > > > >
> > > > > > ​Ok I will try thanks​
> > > > > >
> > > > > > ​a lot​
> > > > > >
> > > > > > > > - Jason
> > > > > > > >
> > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris <
> > > niparisco@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Jason, I have it working,
> > > > > > > > >
> > > > > > > > > Just tell me the way to proceed to PR.
> > > > > > > > > 1. where do I put my maven project ? Witch folder
in my
> drill
> > > > > github
> > > > > > > > fork?
> > > > > > > > > 2. do I need a jira ? how proceed ?
> > > > > > > > >
> > > > > > > > > For now, I only published it on my github account
in a
> > separate
> > > > > > project
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse <
> > > > > altekrusejason@gmail.com
> > > > > > >:
> > > > > > > > >
> > > > > > > > > > Awesome, thanks!
> > > > > > > > > >
> > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas
Paris <
> > > > > niparisco@gmail.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Well I am creating a udf
> > > > > > > > > > > good exercise
> > > > > > > > > > > I hope a PR soon
> > > > > > > > > > >
> > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse
<
> > > > > > > altekrusejason@gmail.com
> > > > > > > > >:
> > > > > > > > > > >
> > > > > > > > > > > > I didn't realize that we were
lacking this
> > functionality.
> > > > As
> > > > > > the
> > > > > > > > > > > > repeated_contains operator handles
wildcards it makes
> > > sense
> > > > > to
> > > > > > > add
> > > > > > > > > > such a
> > > > > > > > > > > > function to drill.
> > > > > > > > > > > >
> > > > > > > > > > > > It should be simple to implement,
would someone like
> to
> > > > open
> > > > > a
> > > > > > > JIRA
> > > > > > > > > and
> > > > > > > > > > > > submit a PR for this?
> > > > > > > > > > > >
> > > > > > > > > > > > - Jason
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM,
John Omernik <
> > > > > john@omernik.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > I would like to see something
like this as well,
> even
> > > if
> > > > > it's
> > > > > > > an
> > > > > > > > > > > included
> > > > > > > > > > > > > UDF like REGEX(field, pattern)
using Java's library
> > for
> > > > > regex
> > > > > > > > like
> > > > > > > > > > Hive
> > > > > > > > > > > > > does.  That would be EXTREMELY
helpful.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55
AM, Nicolas Paris <
> > > > > > > > niparisco@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > ANSI SQL doesn't
define regex operator.
> > > > > > > > > > > > > > > Drill neither.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > ​Drill has SQL functions
extension like
> > > > > > "REPEATED_CONTAINS"​
> > > > > > > > that
> > > > > > > > > > > looks
> > > > > > > > > > > > > to
> > > > > > > > > > > > > > handle regex. regex
operator could be replaced
> with
> > > one
> > > > > new
> > > > > > > SQL
> > > > > > > > > > > > > extension ?
> > > > > > > > > > > > > > I guess I could create
my own functions in java,
> > > right
> > > > ?
> > > > > > > Maybe
> > > > > > > > > push
> > > > > > > > > > > it
> > > > > > > > > > > > > into
> > > > > > > > > > > > > > github then ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Doesn't it enough
'LIKE' operator?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ​Sadly not, I'am looking
for complex pattern
> > > matching.
> > > > ​
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > Miura, Masahide
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > > > > > From: Nicolas Paris
[mailto:
> niparisco@gmail.com]
> > > > > > > > > > > > > > > Sent: Tuesday,
February 02, 2016 9:04 PM
> > > > > > > > > > > > > > > To: user@drill.apache.org
> > > > > > > > > > > > > > > Subject: REGEX
search Operator
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hello,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I can't find any
reference in the documentation
> > > > about a
> > > > > > > regex
> > > > > > > > > > > > operator.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would like to
be able to query this way :
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > SELECT *
> > > > > > > > > > > > > > > FROM xxx
> > > > > > > > > > > > > > > WHERE  text_field
  regexOperator
> > > 'regex_pattern';
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for helping,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message