drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: DrQL grammar and parser
Date Tue, 28 Aug 2012 01:26:32 GMT
Hi Camuel,

I think that it may be not an upfront formalization if we focus on a
logical plan model. Since a logical plan model is generally equivalent to
an algebra, we don't need to concern physical execution methods (i.e.,
iteration models over nested data set).

--
Hyunsik Choi

On Mon, Aug 27, 2012 at 7:07 PM, Camuel Gilyadov <camuel@gmail.com> wrote:

> Of course optimizer must work with some intermediate form of query, which I
> think could be an object graph expressed for now by using ordinary
> programming language objects without much fuss (like Java or Python).
> However, I think its upfront formalization is going to develop into
> never-ending story and mainly because iteration over nested datasets is far
> from being clear, different nested dataset languages are not only differs
> in syntax but also in the iteration model itself. I think Rob
> Grzywinski started this discussion in separate thread already...
>
> Also regarding optimizer and DAG, as there is not much index/joins action
> going on what we have now is mostly a chain of transformations
> which of-course is also formally a DAG :) but thinking about it as a chain
> will simplify things at first, there is no much optimizations you can do
> with it. If you go one level down and consider scalar operations then you
> get more elaborate DAG of course.
> -----------------
> Let's separate issues:
>
> 1. Query Plan is distributed workload and that must be formalized and I
> think no one suggests otherwise. Also no one suggest other model than DAG
> except me. I suggest unrestricted graph just to keep backend useful for
> other stuff, for the purposes of DrQL DAG is more than adequate in my
> opinion. However, this is a DAG of identical nodes, and it follows physical
> data partitioning. Let's label it physical DAG in order not to confuse with
> logical query plan.
>
> 2. Open and somewhat confused issue is what actually runs a single node of
> the above mentioned physical DAG? Is this a formalized query plan or just
> arbitrary code?
>
> 3. Query plan formalization: main obstacle here is that the model of
> iterating nested datasets are far from clear. Particularly nor Dremel paper
> neither BigQuery reference describe well the behavior of querying nested
> datasets with all different subcases. There many other languages to query
> nested data but the iteration model varies significantly between them. For
> formalization we miss one another academic paper which
> would rigorously define canonical high-performance iteration model for
> nested datasets.
>
> 4. Another complicating factor is columnar optimization. Drill is going to
> be nested-columnar engine and as such part of query plan must be columnar.
> So full set of column-oriented and record-oriented primitives are needed
> record-construction primitives.
>
>
> On Mon, Aug 27, 2012 at 9:29 AM, Hyunsik Choi <hyunsik.choi@gmail.com
> >wrote:
>
> > Hi David,
> >
> > I agree with some of your claims. I also think that now DrQL may be
> enough
> > to Drill project.
> >
> > Even if we don't support various query languages, I think complex query
> > languages (like SQL and DrQL) should have an logical form in order to
> deal
> > a given query without considering actual physical information. It
> provides
> > an easy way to modify the query to be more optimized one (e.g., pushing
> > down projection, selection, and finding the best operator order) while
> the
> > optimized one is logically equivalent to the original query.
> >
> > Also, It would not hurt performance. For example, OLTP that processes a
> > query within a few milliseconds already employs such a logical plan
> > model. Although a logical plan is generic, it is not hugely different to
> > existing logical plan models.
> >
> > --
> > Hyunsik Choi
> >
> > On Mon, Aug 27, 2012 at 2:34 PM, David Gruzman <david@bigdatacraft.com
> > >wrote:
> >
> > > Hi,
> > > Dremel is high performance system. I think building something generic
> > > "inter-languages" will hurt performance.
> > > Having generic executor service we can add several different paradigms
> of
> > > the local computation (and even not local). But I think
> > > SQL like query language should be done in most efficient way.
> > > David
> > >
> > > On Mon, Aug 27, 2012 at 3:20 AM, Hyunsik Choi <hyunsik@apache.org>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > How about having a generic logical plan described as a DAG, where
> each
> > > > vertex indicates a logical operator including various annotations and
> > > each
> > > > edge represents a data flow. A DAG has much expressive power. Many
> > > > literatures have shown that most logical plans of various data
> > > manipulation
> > > > languages can be described as such a DAG.
> > > >
> > > > Additional languages have different ASTs, and they can be transformed
> > > into
> > > > the generic logical plan. In this case, we can reuse logical plan,
> > > logical
> > > > plan optimization, and physical execution plan. Besides, Drill may
> > > consider
> > > >  a global plan that represents the distributed execution plan. Since
> > the
> > > > global plan generally depends on the logical plan, we can also reuse
> > all
> > > > code related to the global plan.
> > > >
> > > > --
> > > > Hyunsik Choi
> > > >
> > > >
> > > > On Mon, Aug 27, 2012 at 6:22 AM, Ted Dunning <ted.dunning@gmail.com>
> > > > wrote:
> > > >
> > > > > Camuel,
> > > > >
> > > > > Do you have a grammar test suite that demonstrates the range of
> > > > > expressions?
> > > > >
> > > > > Also, I believe that some have a goal to use additional languages
> > > besides
> > > > > SQL like languages.  A limited version of pig, for instance, would
> be
> > > > very
> > > > > interesting.  To do this, it will be important to have a logical
> plan
> > > > > structure that is common for different syntaxes and is not limited
> to
> > > the
> > > > > idiosyncracies of any particular syntax.
> > > > >
> > > > > How do you think that should be handled?  Do you have an idea for
a
> > > > logical
> > > > > plan structure?
> > > > >
> > > > > On Sun, Aug 26, 2012 at 4:11 PM, Camuel Gilyadov <camuel@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > I've written and attached ANTLR grammar for DrQL which I assume
> is
> > > same
> > > > > as
> > > > > > BigQuery language described in Query Reference on BigQuery
> website.
> > > > This
> > > > > > grammar includes AST production rules.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message