drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Brust <andrew.br...@bluebadgeinsights.com>
Subject RE: what's the differenct between drill and optiq
Date Thu, 28 May 2015 15:09:12 GMT
Agreed, and very interesting.  Lots of people at Datameer seem impressed by Flink.

I have to look up Kylin...

-----Original Message-----
From: Jacques Nadeau [mailto:jacques@apache.org] 
Sent: Thursday, May 28, 2015 1:20 AM
To: user@drill.apache.org
Subject: Re: what's the differenct between drill and optiq

Andrew,

As others have pointed out there are definitely differences in how each different community
project leverages Calcite (remember, Apache Kylin, Phoenix and I believe Flink also use it).
 Remember, Calcite--at its core--is a developers toolkit that other applications/systems incorporate.
While an end user could use Calcite, the most common use is as an embedded library in a broader
system.

The great news is that the community is working together collaborate on an amazing shared
library and framework.

-Jacques



On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Andrew,
>
> Sorry for being cryptic.  Hanifi is more clear.  My point was directed 
> at the differences between where Hive may ultimately go and where 
> Drill is now.  Hanifi was providing a good summary of where Drill is now.
>
> As he said, Calcite does query parsing and planning.  Ultimately, it 
> will do the same for Hive.  Even so, Drill has extended Calcite's 
> planning capabilities in ways which are not used by Hive.  These 
> extensions allow Calcite to produce plans for the Drill execution 
> engine.  That execution engine is what Hanifi meant by flexible 
> distributed columnar execution with late binding.
>
> SQL is not normally a late binding language.  Instead, it shows its 
> long heritage by being a very statically typed language.  That static 
> typing is a problem in the modern world of flexible data and dealing 
> with this problem is a key goal of Drill.
>
> The key technological advance in Drill that enables it to address late 
> typing problems is something called the ANY type.  This is essentially 
> a way for the parser to punt the problem of resolving the type of some 
> value until the query is actually running.  At that point, Drill has 
> an empirical schema available for each record batch which can be used 
> to do final code generation and optimization.  If the empirical schema 
> changes due to changes in the data being processed, that code can be 
> regenerated as needed.
>
> This is a huge philosophical and design change that is hard to just 
> paste onto an existing engine.  Just as it would be next to impossible 
> to modify a Pascal or Fortran execution environment to do the type 
> inferencing and lazy execution that Scala or Haskell do, it is going 
> to be hard to extend Hive's entire execution environment to deal with 
> type dynamism.  Simply passing around dynamic types will not give 
> performance anywhere near what Drill does because of the inevitable cost of type tag
dispatching.
>
> To give just the simplest example, suppose you have data that used a 
> column named X to hold an integer for a long while and then switched 
> to using a column named Y to hold a floating point number.  To deal 
> with this, you might create a view which has a case statement that 
> uses the value of X or Y, whichever is non-null.  In conventional SQL 
> engines, the query parser and planner would generate code for this 
> case statement and it would execute for every record.  With Drill, 
> almost all record batches would have
> *either* X or Y.  Drill would generate different code for those two 
> different patterns of data and that code would be generated with the 
> knowledge that X is null, or that Y is null.  As such, the optimizer 
> in the code generator would actually just completely remove the case 
> statement by evaluating it at code generation time.  By pushing that 
> code generation time very late in the execution, Drill would have no 
> perceptible penalty relative to uniformly typed code, but it would 
> have the ability to deal with non-uniform data.
>
>
> My original comment was an indefensible shorthand for all of this.  
> Things should be made as simple as possible, but no simpler, as the 
> great man said.
>
>
> On Wed, May 27, 2015 at 8:32 PM, Andrew Brust < 
> andrew.brust@bluebadgeinsights.com> wrote:
>
> > That makes sense.  Just having trouble mapping that back on Ted's 
> > comment.  But I tend to think that's me and my ignorance.
> >
> > -----Original Message-----
> > From: Hanifi Gunes [mailto:hgunes@maprtech.com]
> > Sent: Wednesday, May 27, 2015 4:48 PM
> > To: user
> > Subject: Re: what's the differenct between drill and optiq
> >
> > Calcite does parsing & planning of queries. Drill executes in a very 
> > flexible distributed columnar fashion with late binding.
> >
> > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > Andrew,
> > >
> > > What Hive does not have is the extensions that Drill has that 
> > > allow SQL to be type flexible.  The ALL type and all of the 
> > > implications both in terms of implementation and user impact it 
> > > has are a really big
> > deal.
> > >
> > >
> > >
> > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust < 
> > > andrew.brust@bluebadgeinsights.com> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Sent from my phone
> > > > <insert witty apology for typos here>
> > > >
> > > > ----- Reply message -----
> > > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavilli@gmail.com>
> > > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > > Subject: what's the differenct between drill and optiq
> > > > Date: Wed, May 27, 2015 8:33 AM
> > > >
> > > > Yes hive uses calcite. You can refer hive documentation.
> > > > On May 27, 2015 6:01 PM, "Andrew Brust" < 
> > > > andrew.brust@bluebadgeinsights.com>
> > > > wrote:
> > > >
> > > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > > Can anyone here confirm or deny that?
> > > > >
> > > > > -----Original Message-----
> > > > > From: Rajkumar Singh [mailto:rsingh@maprtech.com]
> > > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Re: what's the differenct between drill and optiq
> > > > >
> > > > > Optiq(now known as calcite) is an api for query parser,planner 
> > > > > and optimization, drill uses it for the SQL parsing,validation 
> > > > > and optimization.Drill query planner applies its own custom 
> > > > > planner rules
> > > to
> > > > > build the query logical plan.
> > > > >
> > > > > Rajkumar Singh
> > > > >
> > > > >
> > > > >
> > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlijian@togeek.cn>
wrote:
> > > > > >
> > > > > > Hi:
> > > > > >
> > > > > > I just want to know the difference between drill and optiq.
> > > > > >
> > > > > >
> > > > > > Is drill just 'extend' optiq to support many other 
> > > > > > 'stores'(hadoop,
> > > > > mongodb, ...)?
> > > > > >
> > > > > >
> > > > > > ---from davy
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
Mime
View raw message