From drill-dev-return-929-apmail-incubator-drill-dev-archive=incubator.apache.org@incubator.apache.org Sun Jan 20 09:51:54 2013 Return-Path: X-Original-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-drill-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77853EC77 for ; Sun, 20 Jan 2013 09:51:54 +0000 (UTC) Received: (qmail 54632 invoked by uid 500); 20 Jan 2013 09:51:54 -0000 Delivered-To: apmail-incubator-drill-dev-archive@incubator.apache.org Received: (qmail 54590 invoked by uid 500); 20 Jan 2013 09:51:54 -0000 Mailing-List: contact drill-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: drill-dev@incubator.apache.org Delivered-To: mailing list drill-dev@incubator.apache.org Received: (qmail 54577 invoked by uid 99); 20 Jan 2013 09:51:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jan 2013 09:51:53 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [81.20.94.239] (HELO mx-relay03.cloudservice.ag) (81.20.94.239) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jan 2013 09:51:47 +0000 Received: from qhexhub6.hosting.inetserver.de (unknown [10.20.10.25]) by qhexrelay1.hosting.inetserver.de (Postfix) with ESMTP id 2901F175D20 for ; Sun, 20 Jan 2013 10:51:26 +0100 (CET) Received: from QHEXMBOX2.hosting.inetserver.de ([fe80:0000:0000:0000:d926:a88f:242.116.230.85]) by qhexhub6.hosting.inetserver.de ([10.20.10.225]) with mapi; Sun, 20 Jan 2013 10:51:25 +0100 From: "Siprell, Stefan" To: "drill-dev@incubator.apache.org" Date: Sun, 20 Jan 2013 10:51:25 +0100 Subject: Re: Introduction Thread-Topic: Introduction Thread-Index: Ac3287Yqipywh7NyRxK1f+tJBq4wsg== Message-ID: References: <0C52907D9E22964FA98D0A3C139DD2FB4A8EF44EF0@QHEXMBOX2.hosting.inetserver.de> <49D840B3-EC0D-4D44-87A0-74A20FB51675@gmail.com> <0C52907D9E22964FA98D0A3C139DD2FB4A8EF44EF1@QHEXMBOX2.hosting.inetserver.de> <6AAEB716-487D-4913-B0B4-7B2460C852E5@gmail.com> <7A3AAC0F-A5BF-4ABF-9B52-527AAF2999F6@gmail.com> <4784BC35-04CD-4E7B-B1E5-33798F5A6981@exxeta.com> <3182D590-2E71-426F-AC4B-71B132A46AC6@exxeta.de> In-Reply-To: Accept-Language: de-DE Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: de-DE Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-cloud-security-sender:stefan.siprell@exxeta.de X-cloud-security-recipient:drill-dev@incubator.apache.org X-cloud-security-Virusscan:CLEAN X-cloud-security-disclaimer: This E-Mail was scanned by E-Mailservice on mx-gate03 with CBCB812B4001 X-cloud-security:scantime:.1182 X-Virus-Checked: Checked by ClamAV on apache.org Good morning Jaques, I have added some queries now using your great feedback. I got a little cre= ative on SQL extensions for DataValues, and documented this inline with my = queries. I stumbled on a question regarding indexes and DataValues. Will th= e index point to a record or will it point to a subrecord element? I wrote = this down with my query examples, but this seems to be more general questio= n, so I thought I should repeat it in the dev mailing list. I started draft= ing my queries using like expressions, but found this unnatural, so I moved= towards inlining the hierarchical elements into the statement itself. I also understood drill was more of an analytical platform. So my understan= ding is that we want to access hierarchical data, but we do not want to gen= erate any. Besides trying to run reports, charts or tables (typical client = applications) on hierarchical data is a mess, as the toolset simply doesn't= support it. Out of this reason, I would focus on generating flat result fo= r the time being. If desired I can start writing an ANTLR grammar on the stuff I am working o= n, to make the output more robust. I had a look at the SQL parser you guys = mentioned, but I don't think this would work on my kind of queries, as it = drastically expands SQL 2003. All we want to do is to map the AST to your l= ogic plan? I think this can be done quite easily just using ANTLR and some = Java classes. Stefan On 20.01.2013, at 00:56, Jacques Nadeau wrote: > Many of these haven't been finalized since we're still working on code. > That being said, let me share what my thoughts have been to date. >=20 >> SQL Row maps to a drill record? > Correct >=20 >> And drill would not have a flat sibling structure of nodes, a.k.a. colum= ns > but hierarchical nodes? > Correct. My general thinking is that a record is a DataValue. > A DataValue can be one of three major types: a map (string:DataValue), an > ordered list (DataValues[]), or a scalar DataValue. Most commonly, the > first DataValue in a record would be a map. In the case of SQL/flat data > (e.g. CSV), this map would only contain scalar values. >=20 >> Will drill access the contents of a record in a stream or document manne= r? > How large may i record be? > For the first version of Drill, I was thinking that a record must fit > entirely in memory. Functions can interact with an entire record as they > choose. >=20 >> Can i use Xpath like functions to acces nodes? > Generally, we hope so. 'Like' being the operative word here. The path > expressions that we're thinking of using are substantially simpler than t= he > expressiveness of xpath. Ultimately, I could see people creating a parse= r > which takes in xquerys and converts them to Drill logical plans. That > being said, our goal is more for analytical queries than document > transformations. >=20 >> All of the google bigquery Cook Book Examples seem to generate flat > Output, is this a limitation? > In Drill, we don't plan to limit to flat output. For v1, we're looking a= t > supporting hierarchical expressions in sql 'as' aliases. We're also > looking at supporting selections at any level of hierarchy, not just the > leaf level. We then combine these with a concept of collision behavior > control so that you can control how to merge multiple nested out values > into a single output tree. These will allow one to build a nested output > object. These are preliminary thoughts. We need to write more and discu= ss > more. >=20 > One thing to remember is that one of Drill's goals is to be flexible. > Ultimately, different query languages may support different subsets of > operations and no one query language may include all operators. >=20 > Hope that makes sense. >=20 > Jacques >=20 > On Sat, Jan 19, 2013 at 3:11 PM, Siprell, Stefan > wrote: >=20 >> Aaaah studying the Big query docs helped. I may assume, that a SQL Row >> maps to a drill record? And drill would not have a flat sibling structur= e >> of nodes, a.k.a. columns but hierarchical nodes? All of the google >> bigquery Cook Book Examples seem to generate flat Output, is this a >> limitation? If not how would i generate my hierarchical Output Model, >> without using a groovy builder or xquery :-) >>=20 >>=20 >> Stefan >>=20 >> Von meinem iPad gesendet >>=20 >> Am 20.01.2013 um 00:01 schrieb "Jacques Nadeau" : >>=20 >>> Fair enough. Starting with big query syntax or SQL 2003 and flat data >>> structures will work fine. I'll try to write something meaningful up >> about >>> sql and nested data structures. >>>=20 >>> Jacques >>>=20 >>>=20 >>>=20 >>> On Sat, Jan 19, 2013 at 2:54 PM, Siprell, Stefan >>> wrote: >>>=20 >>>> Should I not just use this here as a reference? >>>>=20 >>>> https://developers.google.com/bigquery/docs/query-reference >>>>=20 >>>> I am a bit stumped to be honest. I am trying to think how to use SQL >>>> efficiently on Nested Data sturctures. >>>>=20 >>>> Von meinem iPad gesendet >>>>=20 >>>> Am 19.01.2013 um 19:51 schrieb "Jacques Nadeau" < >> jacques.drill@gmail.com >>>> >: >>>>=20 >>>>=20 >>>>=20 >>>> * I drew a UML diagram. I saw that there is some glifffy support in >>>> confluenc,e but the free account is pretty much useless. I used omni >>>> graffle to draw the diagram, but this is payware on the mac - is there >> some >>>> usable freeware alternative? Don't mention tigris :-) >>>>=20 >>>>=20 >>>> I don't have any suggestions on this. >>>>=20 >>>>=20 >>>> * I have some ideas on the queries, but I am not sure how I should >> specify >>>> them? Should I use pseudo SQL? Prose? I saw the syntax document on the >>>> server, it it mature enough, that I attempt to use its syntax? Is ther= e >> a >>>> BNF or better ANTLR grammar I can use to check my syntax? Should I dra= w >> one >>>> up while I am at it? >>>>=20 >>>>=20 >>>> I suggest you target SQL2003 (including subqueries). We're looking at >> how >>>> to use Optiq's SQL parser for Drill. Our goal is to stay as close as >>>> possible to that spec but add the following extensions: >>>> - Add flatten operator similar to BigQuery syntax >>>> - Support use of selection and output identifiers using dotted/bracket= ed >>>> notation. E.g. "select person.children[0].age as >>>> output.profile.firstChildAge" >>>> - Support new functions that can accept nested values including >> collections >>>> and maps. For example "select ARRAY_LENGTH(person.children)". >>>>=20 >>>> Once you have some sql examples, the next goal would be to manually >>>> translate those into Logical Plan syntax. This syntax is still >> maturing so >>>> I'd take it to the SQL stage first. >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>> Stefan >>>>=20 >>>>=20 >>>>=20 >>>> On 19.01.2013, at 02:05, Jacques Nadeau > >>> jacques.drill@gmail.com>> wrote: >>>>=20 >>>> The wiki is up. Michael and Stefan, it would be great if you started >>>> putting your use case thoughts there. >>>>=20 >>>> Jacques >>>>=20 >>>> On Sun, Jan 13, 2013 at 3:31 PM, Ted Dunning >>> > >>>> wrote: >>>>=20 >>>> Ahh... yes. That wiki. I will ping infra again. >>>>=20 >>>> (I was attaching your comment to the wikipedia use case and had confus= ed >>>> myself) >>>>=20 >>>> On Sun, Jan 13, 2013 at 2:53 PM, Michael Hausenblas < >>>> michael.hausenblas@gmail.com> >> wrote: >>>>=20 >>>>=20 >>>> What do you need from me? >>>>=20 >>>> Maybe I've overlooked something in which case I apologize - was >>>> wondering >>>> if the public Wiki for Drill is available where Stefan, I and others >>>> can >>>> write up the UC and queries. >>>>=20 >>>> Cheers, >>>> Michael >>>>=20 >>>> -- >>>> Michael Hausenblas >>>> Ireland, Europe >>>> http://mhausenblas.info/ >>>>=20 >>>> On 13 Jan 2013, at 14:20, Ted Dunning >>> ted.dunning@gmail.com>> wrote: >>>>=20 >>>> What do you need from me? >>>>=20 >>>>=20 >>>> On Sun, Jan 13, 2013 at 11:06 AM, Michael Hausenblas < >>>> michael.hausenblas@gmail.com> >> wrote: >>>>=20 >>>> as soon as we hear back from Ted re the Wiki we work there. >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>=20