John, About the escape, I will explore that question. About your query, you may try this pattern : select count(1) from view_mydata where srcday = '2016-02-05' and contains(domain_name, '.*\\.com$'); 2016-02-09 17:19 GMT+01:00 John Omernik : > I copied both files and it appears to work, but after some testing, I am > getting inconsistent results, see below. I ran three queries. first a like > looking for domain names that end in .com (domain_name like '%.com' that > returned a count of 9.8 million. Then I tried the contains, with '\.com$' > which is ends in dot com.... this failed (this goes to my earlier comments > about really wishing we did not do double escaping as normal... for users, > double escaping is NOT normal, thus doing that to meet Java's issues is > hard... not sure how to handle it, it may be a tough issue, but it really > seems like something worth exploring). > > I then did contains(domain_name, '\\.com$) This took quite a bit longer, > and returned 0, so I am not really sure how the function is working at this > point. Thoughts? > > John > > > > > select count(1) from view_mydata where srcday = '2016-02-05' and > domain_name like '%.com'; > +----------+ > | EXPR$0 | > +----------+ > | 9810609 | > +----------+ > 1 row selected (123.673 seconds) > > > > select count(1) from view_mydata where srcday = '2016-02-05' and > contains(domain_name, '\.com$'); > Error: SYSTEM ERROR: ExpressionParsingException: Expression has syntax > error! line 1:79:mismatched input '' expecting CParen > > Fragment 1:13 > > [Error Id: 8e46bed4-f9ba-444f-a3aa-2f57db5ae34f on node3:31010] > (state=,code=0) > > > select count(1) from view_mydata where srcday = '2016-02-05' and > contains(domain_name, '\\.com$'); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (201.391 seconds) > > > > On Tue, Feb 9, 2016 at 9:34 AM, Nicolas Paris wrote: > > > Hi John, > > > > They are actualy two jars to put in the folder (generated in /target). > Have > > you restarted drill after ? > > > > > > > > > > > > 2016-02-09 16:20 GMT+01:00 John Omernik : > > > > > Nicolas, not really sure what's happening here. it compiled fine, but > > when > > > I run it I get this error. The jar is distributed to my bits, I > validated > > > that... it's in the DRILL_HOME/jars/3rdparty folder on every bit... do > I > > > need to do something more than that? > > > > > > > > > > > > select count(1) from view_myview where srcday = '2016-02-05' and > > > contains(domain_name, 'com'); > > > Error: SYSTEM ERROR: IllegalArgumentException: resource > > > /org/apache/drill/contrib/function/SimpleContains.java relative to > > > org.apache.drill.contrib.function.SimpleContains not found. > > > > > > Fragment 1:44 > > > > > > [Error Id: 30c11047-9896-4e16-a207-e3cce79c9db5 on node1:31010] > > > > > > (java.lang.IllegalArgumentException) resource > > > /org/apache/drill/contrib/function/SimpleContains.java relative to > > > org.apache.drill.contrib.function.SimpleContains not found. > > > com.google.common.base.Preconditions.checkArgument():119 > > > com.google.common.io.Resources.getResource():203 > > > org.apache.drill.exec.expr.fn.FunctionInitializer.get():127 > > > org.apache.drill.exec.expr.fn.FunctionInitializer.checkInit():99 > > > org.apache.drill.exec.expr.fn.FunctionInitializer.getMethod():81 > > > org.apache.drill.exec.expr.fn.DrillFuncHolder.meth():94 > > > org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.setupBody():50 > > > org.apache.drill.exec.expr.fn.DrillSimpleFuncHolder.renderEnd():80 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitFunctionHolderExpression():203 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitFunctionHolderExpression():1078 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():816 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitFunctionHolderExpression():796 > > > > > org.apache.drill.common.expression.FunctionHolderExpression.accept():47 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanAnd():690 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitBooleanOperator():172 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitBooleanOperator():1092 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():836 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitBooleanOperator():796 > > > org.apache.drill.common.expression.BooleanOperator.accept():36 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitReturnValueExpression():551 > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$EvalVisitor.visitUnknown():344 > > > > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$ConstantFilter.visitUnknown():1328 > > > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():1027 > > > > > > > org.apache.drill.exec.expr.EvaluationVisitor$CSEFilter.visitUnknown():796 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.filter.ReturnValueExpression.accept():56 > > > org.apache.drill.exec.expr.EvaluationVisitor.addExpr():105 > > > org.apache.drill.exec.expr.ClassGenerator.addExpr():227 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():187 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():109 > > > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():94 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.buildSchema():100 > > > org.apache.drill.exec.record.AbstractRecordBatch.next():142 > > > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > > > > > > > > > > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 > > > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 > > > java.security.AccessController.doPrivileged():-2 > > > javax.security.auth.Subject.doAs():415 > > > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > > > org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 > > > org.apache.drill.common.SelfCleaningRunnable.run():38 > > > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > > > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > > > java.lang.Thread.run():745 (state=,code=0) > > > > > > On Fri, Feb 5, 2016 at 2:39 AM, Nicolas Paris > > wrote: > > > > > > > John, > > > > > > > > Sorry for that, this already work as expected. > > > > Give it a try, this is so easy to deploy > > > > > > > > SELECT first_name FROM cp.`employee.json` WHERE > > > contains(first_name,'\w+') > > > > LIMIT 5; > > > > first_name | > > > > -----------| > > > > Sheri | > > > > Derrick | > > > > Michael | > > > > Maya | > > > > Roberta | > > > > > > > > > > > > 2016-02-04 20:41 GMT+01:00 John Omernik : > > > > > > > > > Ya, do you see where I am coming from here? Let's let the users > > submit > > > > > regex in the pure form if possible, and code the nuances of java > > regex > > > > > behind the scenes. I think it would be a great way to make Drill > very > > > > > accessible and desirable. I think what happened in Hive is the > regex > > > > > commands started with the users having the escape and now there are > > > just > > > > to > > > > > many things that using the escaped regex and the project doesn't > want > > > to > > > > > adjust. > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 1:38 PM, Nicolas Paris > > > > > wrote: > > > > > > > > > > > You mean: > > > > > > userRegex=>javaRegex > > > > > > "\d" => "\\d" > > > > > > "\w" => "\\w" > > > > > > "\n" => "\n" > > > > > > I can do that thanks to regex I guess. > > > > > > I will give a try > > > > > > > > > > > > > > > > > > 2016-02-04 19:37 GMT+01:00 John Omernik : > > > > > > > > > > > > > So my question on the double escape, is there no way to handle > > that > > > > so > > > > > > the > > > > > > > user can use single escaped regex? I know many folks who use > big > > > data > > > > > > > platform to test large complex regexes for things like security > > > > > > appliances, > > > > > > > and having to convert the regex seems like a lot of work if you > > > > > consider > > > > > > > every user has to do that. If there was a way to do it in > Drill, > > > > that > > > > > > > would save countless people hours and save many mistakes. > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 12:03 PM, Nicolas Paris < > > > niparisco@gmail.com> > > > > > > > wrote: > > > > > > > > > > > > > > > John, Jason, > > > > > > > > > > > > > > > > 2016-02-04 18:47 GMT+01:00 John Omernik : > > > > > > > > > > > > > > > > > I'd be curios on how you are implemeting the regex... using > > > > Java's > > > > > > > regex > > > > > > > > > libraries? etc. > > > > > > > > > > > > > > > > > ​Yeah, I use > > > > > > > > java.util.regex > > > > > > > > ​ > > > > > > > > > > > > > > > > > > > > > > > > > I know one thing with Hive that always bothered me was the > > need > > > > to > > > > > > > double > > > > > > > > > escape things. > > > > > > > > > > > > > > > > > > '\d\d\d\d-\d\d-\d\d' needed to be > > '\\d\\d\\d\\d-\\d\\d-\\d\\d' > > > > of > > > > > we > > > > > > > can > > > > > > > > > avoid that it would be AWESOME. > > > > > > > > > > > > > > > > > ​My guess is this comes from java way to handle strings. All > > > > > langages I > > > > > > > > have used need to double escape.​ > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 11:37 AM, Jason Altekruse < > > > > > > > > altekrusejason@gmail.com > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > ​code is here: > > https://github.com/parisni/drill-simple-contains > > > > > > > > It's disturbing how it is simple... > > > > > > > > ​ > > > > > > > > > > > > > > > > > > > > > > > > > > I think you should actually just put the function in > > > > > > > > > ​​ > > > > > > > > > Drill itself. System > > > > > > > > > > native functions are implemented in the same interface as > > > UDFs, > > > > > > > because > > > > > > > > > our > > > > > > > > > > mechanism for evaluating them is very efficient (we code > > > > generate > > > > > > > code > > > > > > > > > > blocks by linking together the bodies of the individual > > > > functions > > > > > > to > > > > > > > > > > evaluate a complete expression). > > > > > > > > > > > > > > > > > ​well the folder tree is quite impressive ( > > > > > > > https://github.com/apache/drill > > > > > > > > ). > > > > > > > > ​ > > > > > > > > > > > > > > > > ​what folder is supposed to be " > > > > > > > > ​ > > > > > > > > Drill itself" > > > > > > > > ​ ?​ > > > > > > > > ​ > > > > > > > > > > > > > > > > > > You can open a JIRA, marking it a feature request. You > can > > > > open a > > > > > > > poll > > > > > > > > > > request against the apache github repo, making sure you > > > follow > > > > > the > > > > > > > > > standard > > > > > > > > > > format for your commit message, prefixing with the JIRA > > > number > > > > in > > > > > > the > > > > > > > > > > format > > > > > > > > > > Example: > > > > > > > > > > DRILL-XXXX: Feature description > > > > > > > > > > > > > > > > > > > > This will automatically link the PR to your JIRA. > > > > > > > > > > > > > > > > > ​Ok I will try thanks​ > > > > > > > > > > > > > > > > ​a lot​ > > > > > > > > > > > > > > > > > > - Jason > > > > > > > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 8:44 AM, Nicolas Paris < > > > > > niparisco@gmail.com > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Jason, I have it working, > > > > > > > > > > > > > > > > > > > > > > Just tell me the way to proceed to PR. > > > > > > > > > > > 1. where do I put my maven project ? Witch folder in my > > > drill > > > > > > > github > > > > > > > > > > fork? > > > > > > > > > > > 2. do I need a jira ? how proceed ? > > > > > > > > > > > > > > > > > > > > > > For now, I only published it on my github account in a > > > > separate > > > > > > > > project > > > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > 2016-02-04 16:52 GMT+01:00 Jason Altekruse < > > > > > > > altekrusejason@gmail.com > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > Awesome, thanks! > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Feb 4, 2016 at 7:44 AM, Nicolas Paris < > > > > > > > niparisco@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Well I am creating a udf > > > > > > > > > > > > > good exercise > > > > > > > > > > > > > I hope a PR soon > > > > > > > > > > > > > > > > > > > > > > > > > > 2016-02-04 16:37 GMT+01:00 Jason Altekruse < > > > > > > > > > altekrusejason@gmail.com > > > > > > > > > > >: > > > > > > > > > > > > > > > > > > > > > > > > > > > I didn't realize that we were lacking this > > > > functionality. > > > > > > As > > > > > > > > the > > > > > > > > > > > > > > repeated_contains operator handles wildcards it > > makes > > > > > sense > > > > > > > to > > > > > > > > > add > > > > > > > > > > > > such a > > > > > > > > > > > > > > function to drill. > > > > > > > > > > > > > > > > > > > > > > > > > > > > It should be simple to implement, would someone > > like > > > to > > > > > > open > > > > > > > a > > > > > > > > > JIRA > > > > > > > > > > > and > > > > > > > > > > > > > > submit a PR for this? > > > > > > > > > > > > > > > > > > > > > > > > > > > > - Jason > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 8:56 AM, John Omernik < > > > > > > > john@omernik.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to see something like this as > well, > > > even > > > > > if > > > > > > > it's > > > > > > > > > an > > > > > > > > > > > > > included > > > > > > > > > > > > > > > UDF like REGEX(field, pattern) using Java's > > library > > > > for > > > > > > > regex > > > > > > > > > > like > > > > > > > > > > > > Hive > > > > > > > > > > > > > > > does. That would be EXTREMELY helpful. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Feb 2, 2016 at 6:55 AM, Nicolas Paris < > > > > > > > > > > niparisco@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ANSI SQL doesn't define regex operator. > > > > > > > > > > > > > > > > > Drill neither. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ​Drill has SQL functions extension like > > > > > > > > "REPEATED_CONTAINS"​ > > > > > > > > > > that > > > > > > > > > > > > > looks > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > handle regex. regex operator could be > replaced > > > with > > > > > one > > > > > > > new > > > > > > > > > SQL > > > > > > > > > > > > > > > extension ? > > > > > > > > > > > > > > > > I guess I could create my own functions in > > java, > > > > > right > > > > > > ? > > > > > > > > > Maybe > > > > > > > > > > > push > > > > > > > > > > > > > it > > > > > > > > > > > > > > > into > > > > > > > > > > > > > > > > github then ? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Doesn't it enough 'LIKE' operator? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ​Sadly not, I'am looking for complex pattern > > > > > matching. > > > > > > ​ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > Miura, Masahide > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > > > > > > > > From: Nicolas Paris [mailto: > > > niparisco@gmail.com] > > > > > > > > > > > > > > > > > Sent: Tuesday, February 02, 2016 9:04 PM > > > > > > > > > > > > > > > > > To: user@drill.apache.org > > > > > > > > > > > > > > > > > Subject: REGEX search Operator > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I can't find any reference in the > > documentation > > > > > > about a > > > > > > > > > regex > > > > > > > > > > > > > > operator. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would like to be able to query this way : > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > SELECT * > > > > > > > > > > > > > > > > > FROM xxx > > > > > > > > > > > > > > > > > WHERE text_field regexOperator > > > > > 'regex_pattern'; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for helping, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >