From user-return-7538-apmail-drill-user-archive=drill.apache.org@drill.apache.org Tue Feb 28 16:48:53 2017 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB24919590 for ; Tue, 28 Feb 2017 16:48:53 +0000 (UTC) Received: (qmail 46249 invoked by uid 500); 28 Feb 2017 16:48:53 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 46180 invoked by uid 500); 28 Feb 2017 16:48:53 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 46167 invoked by uid 99); 28 Feb 2017 16:48:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2017 16:48:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id ABD03C094B for ; Tue, 28 Feb 2017 16:48:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=omernik-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id PKw-_ZZxWp4i for ; Tue, 28 Feb 2017 16:48:51 +0000 (UTC) Received: from mail-qk0-f175.google.com (mail-qk0-f175.google.com [209.85.220.175]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B7A0C5F4E5 for ; Tue, 28 Feb 2017 16:48:50 +0000 (UTC) Received: by mail-qk0-f175.google.com with SMTP id n127so27324331qkf.0 for ; Tue, 28 Feb 2017 08:48:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=omernik-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=3fB9U184DbPl4XLsBuZUsMPXP2IictrLERjDX26taTE=; b=jh33T7XkM+V+GaRuJgjnJxOJMN29b2sgSeWNRjd7Qk40Q1xxqxDgtrDZjTmW71FZkW SYZsewwMFOdEQozo2xMrKzLqo6DYxRiLmDhPn8VUlvEEg+U+pL/1GFDd/T95u2nkpD+k cIKQFl/JuyY/H1dpO9NLDSDNC0RCtRLtVmnhhdGfG4XYGVWHtNnpiNn3lWEvBkEKCQlv XTCVoVZbj7SwSySDATIXyuwk9KGsainhi1o8HLGGuQmHE7OeEI4j2xGYIj1Yc8N98Gg0 BWGm6Rm9BnidHptUanxExz9a+t+YKAHXqQ07qwWGMHr820qVHgeUADMvbR6DdkAGdatQ FIEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=3fB9U184DbPl4XLsBuZUsMPXP2IictrLERjDX26taTE=; b=S3N5wx7Q7sE0xYQyjnBHVWobMpQ2Z2o8UB77bLwFj5uq6LMXm67JHQjRCgkiTyMLvM VKy8da/AXC8ItrWZy0uNuDYY5Or82xQILqZrHvo5n1rwCYvAuVlaL9rWYe/bnbzpTTVB Es8iIColZ4YlLDRs9sL9nzJXMo7E7yfrEr7uxmBCBnDuxdyyLuOp14k+CeDXN3UMkQKQ qYkgm95oWX+1pKSBNu+tBZtrWw/b1dmdP9tU/NY86D88uAHgukb6K9CFQyUCjsiRJoGg bM/SQOSMzxWlM9dXkNOXJFa++j2Eww4oiX4Ts2o1aDTO6HqCbQrcYgU3SzRbG9SEwyR8 kIrQ== X-Gm-Message-State: AMke39mHfSAuE/Eia8f8Q3wH15698TgqNRrfwctkAAYKz8G2Dx7/6Hsa/F+EkEu3vPGKYkyGcalITch/Xbp03A== X-Received: by 10.55.16.230 with SMTP id 99mr3934338qkq.295.1488300529921; Tue, 28 Feb 2017 08:48:49 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.30.38 with HTTP; Tue, 28 Feb 2017 08:48:29 -0800 (PST) In-Reply-To: References: From: John Omernik Date: Tue, 28 Feb 2017 10:48:29 -0600 Message-ID: Subject: Re: Dealing with bad data when trying to do date computations To: user Content-Type: multipart/alternative; boundary=001a11459ca267cf59054999f8ad --001a11459ca267cf59054999f8ad Content-Type: text/plain; charset=UTF-8 Thanks Charles, that worked even on my 1.8. Drill folks: We need to do some documentation updates. We've added functions (like REGEXP_MATCHES, and it's in 1.8, so I am not sure where it was added) and other functions like SPLIT and yet no mention in https://drill.apache.org/docs/string-manipulation/ So, yes, this is "meh" work compared to programming all the cool things in Drill. But there are a number of reasons that this needs to be done besides common practices. 1. Users, and more importantly POTENTIAL users get frustrated when trying to use drill for the first time. Coming from other Big Data systems like Hive, not having Regex, split, and other functions is frustrating. But what is more frustrating is to find that they actually exist, and are just not documented. Nothing will turn people off faster. 2. Without the knowledge of these functions, people try "hacky" work arounds like what I did, killing performance, and setting Drill in a bad light. 3. It provides an over all feeling of lack of effort by the community. I am know that resources are not unlimited, and these things need to be addressed by "someone" but issues like this are really important for getting more people into the community who may be able to help contribute! 4. I think as part of developer review and pull requests that add functions/functionality should require a pull request to also provide a documentation update. This helps to ensure that the docs keep up to date, as well as keeping users appraised of what is happening... i.e. it's a good "feeling" to see a great tool like Drill "improving" with new functionality. Please, folks, we need to do some one time clean up (go back through pull requests to ensure all functions are documented up to now) and then then get processes in place to ensure ongoing updates. Thanks John Omernik On Tue, Feb 28, 2017 at 10:15 AM, Charles Givre wrote: > Hi John, > I believe that Drill 1.9 includes a REGEXP_MATCHES( , ) > function which does what you'd expect it to. I'm not sure when this was > introduced, so it maybe in earlier versions of Drill. > Best, > -- C > > On Tue, Feb 28, 2017 at 11:03 AM, John Omernik wrote: > > > I have a data set that has birthdays in YYYY-MM-DD format. > > > > Most of this data is great. I am trying to compute the age using > > > > EXTRACT(year from age(dob)) > > > > > > But some of my data is crapola... let's call it alternative data... > > > > > > When I try to run the Extract function, I get > > > > Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear > > must be in the range [1,12] > > > > Fragment 5:17 > > > > [Error Id: 62f90784-c9f4-4362-9710-a37464fc801a on drillnode:20005] > > > > > > I've tried an ugly where clause, and this works: > > > > where > > > > (dob LIKE '%-01-%' or dob LIKE '%-02-%' or dob LIKE '%-03-%' or dob LIKE > > '%-04-%' or dob LIKE '%-05-%' or dob LIKE '%-06-%' or dob LIKE '%-07-%' > or > > dob LIKE '%-08-%' or dob LIKE '%-09-%' or > > > > dob LIKE '%-1-%' or dob LIKE '%-2-%' or dob LIKE '%-3-%' or dob LIKE > > '%-4-%' or dob LIKE '%-5-%' or dob LIKE '%-6-%' or dob LIKE '%-7-%' or > dob > > LIKE '%-8-%' or dob LIKE '%-9-%' or > > > > dob LIKE '%-10-%' or dob LIKE '%-11-%' or dob LIKE '%-12-%') > > > > > > But WOW is that ugly. I could add the jar for regex contains, and make it > > much easier (do we have a regex search function built into drill? I think > > we should at this point...) > > > > > > Is there another way to say try the extra function, and catch a failure, > > and ignore on failure? What if we had a cast function that returned NULL > > on failure so we could use it in the where clause? Any other more > elegant > > ways to handle this? > > > > > > Thanks! > > > > > > John > > > --001a11459ca267cf59054999f8ad--