drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siprell, Stefan" <stefan.sipr...@exxeta.de>
Subject Re: Introduction
Date Sun, 20 Jan 2013 09:51:25 GMT
Good morning Jaques,

I have added some queries now using your great feedback. I got a little creative on SQL extensions
for DataValues, and documented this inline with my queries. I stumbled on a question regarding
indexes and DataValues. Will the index point to a record or will it point to a subrecord element?
I wrote this down with my query examples, but this seems to be more general question, so I
thought I should repeat it in the dev mailing list. I started drafting my queries using like
expressions, but found this unnatural, so I moved towards inlining the hierarchical elements
into the statement itself.

I also understood drill was more of an analytical platform. So my understanding is that we
want to access hierarchical data, but we do not want to generate any. Besides trying to run
reports, charts or tables (typical client applications) on hierarchical data is a mess, as
the toolset simply doesn't support it. Out of this reason, I would focus on generating flat
result for the time being.

If desired I can start writing an ANTLR grammar on the stuff I am working on, to make the
output more robust. I had a look at the SQL parser you guys mentioned, but I don't think this
would work on my kind of queries, as it  drastically expands SQL 2003. All we want to do is
to map the AST to your logic plan? I think this can be done quite easily just using ANTLR
and some Java classes.

Stefan

On 20.01.2013, at 00:56, Jacques Nadeau <jacques.drill@gmail.com> wrote:

> Many of these haven't been finalized since we're still working on code.
> That being said, let me share what my thoughts have been to date.
> 
>> SQL Row maps to a drill record?
> Correct
> 
>> And drill would not have a flat sibling structure of nodes, a.k.a. columns
> but hierarchical nodes?
> Correct.  My general thinking is that a record is a DataValue.
> A DataValue can be one of three major types: a map (string:DataValue), an
> ordered list (DataValues[]), or a scalar DataValue.  Most commonly, the
> first DataValue in a record would be a map.  In the case of SQL/flat data
> (e.g. CSV), this map would only contain scalar values.
> 
>> Will drill access the contents of a record in a stream or document manner?
> How large may i record be?
> For the first version of Drill, I was thinking that a record must fit
> entirely in memory.  Functions can interact with an entire record as they
> choose.
> 
>> Can i use Xpath like functions to acces nodes?
> Generally, we hope so.  'Like' being the operative word here.  The path
> expressions that we're thinking of using are substantially simpler than the
> expressiveness of xpath.  Ultimately, I could see people creating a parser
> which takes in xquerys and converts them to Drill logical plans.  That
> being said, our goal is more for analytical queries than document
> transformations.
> 
>> All of the google bigquery Cook Book Examples seem to generate flat
> Output, is this a limitation?
> In Drill, we don't plan to limit to flat output.  For v1, we're looking at
> supporting hierarchical expressions in sql 'as' aliases.  We're also
> looking at supporting selections at any level of hierarchy, not just the
> leaf level.  We then combine these with a concept of collision behavior
> control so that you can control how to merge multiple nested out values
> into a single output tree.  These will allow one to build a nested output
> object.  These are preliminary thoughts.  We need to write more and discuss
> more.
> 
> One thing to remember is that one of Drill's goals is to be flexible.
> Ultimately, different query languages may support different subsets of
> operations and no one query language may include all operators.
> 
> Hope that makes sense.
> 
> Jacques
> 
> On Sat, Jan 19, 2013 at 3:11 PM, Siprell, Stefan
> <stefan.siprell@exxeta.de>wrote:
> 
>> Aaaah studying the Big query docs helped. I may assume, that a SQL Row
>> maps to a drill record? And drill would not have a flat sibling structure
>> of nodes, a.k.a. columns but hierarchical nodes?   All of the google
>> bigquery Cook Book Examples seem to generate flat Output, is this a
>> limitation? If not how would i generate my hierarchical Output Model,
>> without using a groovy builder or xquery :-)
>> 
>> 
>> Stefan
>> 
>> Von meinem iPad gesendet
>> 
>> Am 20.01.2013 um 00:01 schrieb "Jacques Nadeau" <jacques.drill@gmail.com>:
>> 
>>> Fair enough.  Starting with big query syntax or SQL 2003 and flat data
>>> structures will work fine.  I'll try to write something meaningful up
>> about
>>> sql and nested data structures.
>>> 
>>> Jacques
>>> 
>>> 
>>> 
>>> On Sat, Jan 19, 2013 at 2:54 PM, Siprell, Stefan
>>> <stefan.siprell@exxeta.de>wrote:
>>> 
>>>> Should I not just use this here as a reference?
>>>> 
>>>> https://developers.google.com/bigquery/docs/query-reference
>>>> 
>>>> I am a bit stumped to be honest. I am trying to think how to use SQL
>>>> efficiently on Nested Data sturctures.
>>>> 
>>>> Von meinem iPad gesendet
>>>> 
>>>> Am 19.01.2013 um 19:51 schrieb "Jacques Nadeau" <
>> jacques.drill@gmail.com
>>>> <mailto:jacques.drill@gmail.com>>:
>>>> 
>>>> 
>>>> 
>>>> * I drew a UML diagram. I saw that there is some glifffy support in
>>>> confluenc,e but the free account is pretty much useless. I used omni
>>>> graffle to draw the diagram, but this is payware on the mac - is there
>> some
>>>> usable freeware alternative? Don't mention tigris :-)
>>>> 
>>>> 
>>>> I don't have any suggestions on this.
>>>> 
>>>> 
>>>> * I have some ideas on the queries, but I am not sure how I should
>> specify
>>>> them? Should I use pseudo SQL? Prose? I saw the syntax document on the
>>>> server, it it mature enough, that I attempt to use its syntax? Is there
>> a
>>>> BNF or better ANTLR grammar I can use to check my syntax? Should I draw
>> one
>>>> up while I am at it?
>>>> 
>>>> 
>>>> I suggest you target SQL2003 (including subqueries).  We're looking at
>> how
>>>> to use Optiq's SQL parser for Drill.  Our goal is to stay as close as
>>>> possible to that spec but add the following extensions:
>>>> - Add flatten operator similar to BigQuery syntax
>>>> - Support use of selection and output identifiers using dotted/bracketed
>>>> notation.  E.g. "select person.children[0].age as
>>>> output.profile.firstChildAge"
>>>> - Support new functions that can accept nested values including
>> collections
>>>> and maps.  For example "select ARRAY_LENGTH(person.children)".
>>>> 
>>>> Once you have some sql examples, the next goal would be to manually
>>>> translate those into Logical Plan syntax.  This syntax is still
>> maturing so
>>>> I'd take it to the SQL stage first.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Stefan
>>>> 
>>>> 
>>>> 
>>>> On 19.01.2013, at 02:05, Jacques Nadeau <jacques.drill@gmail.com
>> <mailto:
>>>> jacques.drill@gmail.com>> wrote:
>>>> 
>>>> The wiki is up.  Michael and Stefan, it would be great if you started
>>>> putting your use case thoughts there.
>>>> 
>>>> Jacques
>>>> 
>>>> On Sun, Jan 13, 2013 at 3:31 PM, Ted Dunning <ted.dunning@gmail.com
>>>> <mailto:ted.dunning@gmail.com>>
>>>> wrote:
>>>> 
>>>> Ahh... yes.  That wiki.  I will ping infra again.
>>>> 
>>>> (I was attaching your comment to the wikipedia use case and had confused
>>>> myself)
>>>> 
>>>> On Sun, Jan 13, 2013 at 2:53 PM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com<mailto:michael.hausenblas@gmail.com>>
>> wrote:
>>>> 
>>>> 
>>>> What do you need from me?
>>>> 
>>>> Maybe I've overlooked something in which case I apologize - was
>>>> wondering
>>>> if the public Wiki for Drill is available where Stefan, I and others
>>>> can
>>>> write up the UC and queries.
>>>> 
>>>> Cheers,
>>>>             Michael
>>>> 
>>>> --
>>>> Michael Hausenblas
>>>> Ireland, Europe
>>>> http://mhausenblas.info/
>>>> 
>>>> On 13 Jan 2013, at 14:20, Ted Dunning <ted.dunning@gmail.com<mailto:
>>>> ted.dunning@gmail.com>> wrote:
>>>> 
>>>> What do you need from me?
>>>> 
>>>> 
>>>> On Sun, Jan 13, 2013 at 11:06 AM, Michael Hausenblas <
>>>> michael.hausenblas@gmail.com<mailto:michael.hausenblas@gmail.com>>
>> wrote:
>>>> 
>>>> as soon as we hear back from Ted re the Wiki we work there.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 


Mime
View raw message