Agreed, and very interesting. Lots of people at Datameer seem impressed by Flink.
I have to look up Kylin...
-----Original Message-----
From: Jacques Nadeau [mailto:jacques@apache.org]
Sent: Thursday, May 28, 2015 1:20 AM
To: user@drill.apache.org
Subject: Re: what's the differenct between drill and optiq
Andrew,
As others have pointed out there are definitely differences in how each different community
project leverages Calcite (remember, Apache Kylin, Phoenix and I believe Flink also use it).
Remember, Calcite--at its core--is a developers toolkit that other applications/systems incorporate.
While an end user could use Calcite, the most common use is as an embedded library in a broader
system.
The great news is that the community is working together collaborate on an amazing shared
library and framework.
-Jacques
On Wed, May 27, 2015 at 10:10 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Andrew,
>
> Sorry for being cryptic. Hanifi is more clear. My point was directed
> at the differences between where Hive may ultimately go and where
> Drill is now. Hanifi was providing a good summary of where Drill is now.
>
> As he said, Calcite does query parsing and planning. Ultimately, it
> will do the same for Hive. Even so, Drill has extended Calcite's
> planning capabilities in ways which are not used by Hive. These
> extensions allow Calcite to produce plans for the Drill execution
> engine. That execution engine is what Hanifi meant by flexible
> distributed columnar execution with late binding.
>
> SQL is not normally a late binding language. Instead, it shows its
> long heritage by being a very statically typed language. That static
> typing is a problem in the modern world of flexible data and dealing
> with this problem is a key goal of Drill.
>
> The key technological advance in Drill that enables it to address late
> typing problems is something called the ANY type. This is essentially
> a way for the parser to punt the problem of resolving the type of some
> value until the query is actually running. At that point, Drill has
> an empirical schema available for each record batch which can be used
> to do final code generation and optimization. If the empirical schema
> changes due to changes in the data being processed, that code can be
> regenerated as needed.
>
> This is a huge philosophical and design change that is hard to just
> paste onto an existing engine. Just as it would be next to impossible
> to modify a Pascal or Fortran execution environment to do the type
> inferencing and lazy execution that Scala or Haskell do, it is going
> to be hard to extend Hive's entire execution environment to deal with
> type dynamism. Simply passing around dynamic types will not give
> performance anywhere near what Drill does because of the inevitable cost of type tag
dispatching.
>
> To give just the simplest example, suppose you have data that used a
> column named X to hold an integer for a long while and then switched
> to using a column named Y to hold a floating point number. To deal
> with this, you might create a view which has a case statement that
> uses the value of X or Y, whichever is non-null. In conventional SQL
> engines, the query parser and planner would generate code for this
> case statement and it would execute for every record. With Drill,
> almost all record batches would have
> *either* X or Y. Drill would generate different code for those two
> different patterns of data and that code would be generated with the
> knowledge that X is null, or that Y is null. As such, the optimizer
> in the code generator would actually just completely remove the case
> statement by evaluating it at code generation time. By pushing that
> code generation time very late in the execution, Drill would have no
> perceptible penalty relative to uniformly typed code, but it would
> have the ability to deal with non-uniform data.
>
>
> My original comment was an indefensible shorthand for all of this.
> Things should be made as simple as possible, but no simpler, as the
> great man said.
>
>
> On Wed, May 27, 2015 at 8:32 PM, Andrew Brust <
> andrew.brust@bluebadgeinsights.com> wrote:
>
> > That makes sense. Just having trouble mapping that back on Ted's
> > comment. But I tend to think that's me and my ignorance.
> >
> > -----Original Message-----
> > From: Hanifi Gunes [mailto:hgunes@maprtech.com]
> > Sent: Wednesday, May 27, 2015 4:48 PM
> > To: user
> > Subject: Re: what's the differenct between drill and optiq
> >
> > Calcite does parsing & planning of queries. Drill executes in a very
> > flexible distributed columnar fashion with late binding.
> >
> > On Wed, May 27, 2015 at 8:34 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> >
> > > Andrew,
> > >
> > > What Hive does not have is the extensions that Drill has that
> > > allow SQL to be type flexible. The ALL type and all of the
> > > implications both in terms of implementation and user impact it
> > > has are a really big
> > deal.
> > >
> > >
> > >
> > > On Wed, May 27, 2015 at 6:08 AM, Andrew Brust <
> > > andrew.brust@bluebadgeinsights.com> wrote:
> > >
> > > > Thanks!
> > > >
> > > > Sent from my phone
> > > > <insert witty apology for typos here>
> > > >
> > > > ----- Reply message -----
> > > > From: "PHANI KUMAR YADAVILLI" <phanikumaryadavilli@gmail.com>
> > > > To: "user@drill.apache.org" <user@drill.apache.org>
> > > > Subject: what's the differenct between drill and optiq
> > > > Date: Wed, May 27, 2015 8:33 AM
> > > >
> > > > Yes hive uses calcite. You can refer hive documentation.
> > > > On May 27, 2015 6:01 PM, "Andrew Brust" <
> > > > andrew.brust@bluebadgeinsights.com>
> > > > wrote:
> > > >
> > > > > Folks at Hortonworks told me that Hive now uses Calcite as well.
> > > > > Can anyone here confirm or deny that?
> > > > >
> > > > > -----Original Message-----
> > > > > From: Rajkumar Singh [mailto:rsingh@maprtech.com]
> > > > > Sent: Wednesday, May 27, 2015 6:52 AM
> > > > > To: user@drill.apache.org
> > > > > Subject: Re: what's the differenct between drill and optiq
> > > > >
> > > > > Optiq(now known as calcite) is an api for query parser,planner
> > > > > and optimization, drill uses it for the SQL parsing,validation
> > > > > and optimization.Drill query planner applies its own custom
> > > > > planner rules
> > > to
> > > > > build the query logical plan.
> > > > >
> > > > > Rajkumar Singh
> > > > >
> > > > >
> > > > >
> > > > > > On May 27, 2015, at 12:04 PM, 陈礼剑 <chenlijian@togeek.cn>
wrote:
> > > > > >
> > > > > > Hi:
> > > > > >
> > > > > > I just want to know the difference between drill and optiq.
> > > > > >
> > > > > >
> > > > > > Is drill just 'extend' optiq to support many other
> > > > > > 'stores'(hadoop,
> > > > > mongodb, ...)?
> > > > > >
> > > > > >
> > > > > > ---from davy
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
|