drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Brust <andrew.br...@bluebadgeinsights.com>
Subject RE: what's the differenct between drill and optiq
Date Thu, 28 May 2015 15:08:19 GMT
Absolutely nothing to apologize for, and the below explanation is very helpful.

FWIW, I certainly understood that Hive's use of Calcite offered relatively little in the way
of type flexibility/late binding, compare to Drill.  I get that Drill's entire raison d'etre
is around this and never thought that Hive "had it too."  It was more a question of my being
surprised that the query planners had any common technology at all.  I have never coded in
Scala or Haskell, but I have coded plenty in C#, Pascal and VB, and I can apprecaiute the
analogy just by having experience with one half of it.

It's part of the reason I think Drill is so cool, and part of the reason why MapR did so well
in one of Gigaom's last Sector Roadmaps.

My "ponder question" is whether mainstream RDBMSes like Oracle and SQL Server will one day
add Drill-like late binding functionality.

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Thursday, May 28, 2015 1:10 AM
To: user@drill.apache.org
Subject: Re: what's the differenct between drill and optiq

Andrew,

Sorry for being cryptic.  Hanifi is more clear.  My point was directed at the differences
between where Hive may ultimately go and where Drill is now.  Hanifi was providing a good
summary of where Drill is now.

As he said, Calcite does query parsing and planning.  Ultimately, it will do the same for
Hive.  Even so, Drill has extended Calcite's planning capabilities in ways which are not used
by Hive.  These extensions allow Calcite to produce plans for the Drill execution engine.
 That execution engine is what Hanifi meant by flexible distributed columnar execution with
late binding.

SQL is not normally a late binding language.  Instead, it shows its long heritage by being
a very statically typed language.  That static typing is a problem in the modern world of
flexible data and dealing with this problem is a key goal of Drill.

The key technological advance in Drill that enables it to address late typing problems is
something called the ANY type.  This is essentially a way for the parser to punt the problem
of resolving the type of some value until the query is actually running.  At that point, Drill
has an empirical schema available for each record batch which can be used to do final code
generation and optimization.  If the empirical schema changes due to changes in the data being
processed, that code can be regenerated as needed.

This is a huge philosophical and design change that is hard to just paste onto an existing
engine.  Just as it would be next to impossible to modify a Pascal or Fortran execution environment
to do the type inferencing and lazy execution that Scala or Haskell do, it is going to be
hard to extend Hive's entire execution environment to deal with type dynamism.  Simply passing
around dynamic types will not give performance anywhere near what Drill does because of the
inevitable cost of type tag dispatching.

To give just the simplest example, suppose you have data that used a column named X to hold
an integer for a long while and then switched to using a column named Y to hold a floating
point number.  To deal with this, you might create a view which has a case statement that
uses the value of X or Y, whichever is non-null.  In conventional SQL engines, the query parser
and planner would generate code for this case statement and it would execute for every record.
 With Drill, almost all record batches would have
*either* X or Y.  Drill would generate different code for those two different patterns of
data and that code would be generated with the knowledge that X is null, or that Y is null.
 As such, the optimizer in the code generator would actually just completely remove the case
statement by evaluating it at code generation time.  By pushing that code generation time
very late in the execution, Drill would have no perceptible penalty relative to uniformly
typed code, but it would have the ability to deal with non-uniform data.


My original comment was an indefensible shorthand for all of this.  Things should be made
as simple as possible, but no simpler, as the great man said.


On Wed, May 27, 2015 at 8:32 PM, Andrew Brust < andrew.brust@bluebadgeinsights.com>
wrote:

> That makes sense.  Just having trouble mapping that back on Ted's 
> comment.  But I tend to think that's me and my ignorance.
>
> -----Original Message-----
> From: Hanifi Gunes [mailto:hgunes@maprtech.com]
> Sent: Wednesday, May 27, 2015 4:48 PM
> To: user
> Subject: Re: what's the differenct between drill and optiq
>
> Calcite does parsing & planning of queries. Drill executes in a very 
> flexible distributed columnar fashion with late binding.
>
> On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > Andrew,
> >
> > What Hive does not have is the extensions that Drill has that allow 
> > SQL to be type flexible.  The ALL type and all of the implications 
> > both in terms of implementation and user impact it has are a really 
> > big
> deal.
> >
> >
> >
> > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < 
> > andrew.brust@bluebadgeinsights.com> wrote:
> >
> > > Thanks!
> > >
> > > Sent from my phone
> > > <insert witty apology for typos here>
> > >
> > > ----- Reply message -----
> > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavilli@gmail.com>
> > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > Subject: what's the differenct between drill and optiq
> > > Date: Wed, May 27, 2015 8:33 AM
> > >
> > > Yes hive uses calcite. You can refer hive documentation.
> > > On May 27, 2015 6:01 PM, "Andrew Brust" < 
> > > andrew.brust@bluebadgeinsights.com>
> > > wrote:
> > >
> > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > Can anyone here confirm or deny that?
> > > >
> > > > -----Original Message-----
> > > > From: Rajkumar Singh [mailto:rsingh@maprtech.com]
> > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > To: user@drill.apache.org
> > > > Subject: Re: what's the differenct between drill and optiq
> > > >
> > > > Optiq(now known as calcite) is an api for query parser,planner 
> > > > and optimization, drill uses it for the SQL parsing,validation 
> > > > and optimization.Drill query planner applies its own custom 
> > > > planner rules
> > to
> > > > build the query logical plan.
> > > >
> > > > Rajkumar Singh
> > > >
> > > >
> > > >
> > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlijian@togeek.cn>
wrote:
> > > > >
> > > > > Hi:
> > > > >
> > > > > I just want to know the difference between drill and optiq.
> > > > >
> > > > >
> > > > > Is drill just 'extend' optiq to support many other 
> > > > > 'stores'(hadoop,
> > > > mongodb, ...)?
> > > > >
> > > > >
> > > > > ---from davy
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> >
>
Mime
View raw message