drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camuel Gilyadov <cam...@gmail.com>
Subject Re: logical plan design coming together
Date Sun, 14 Oct 2012 14:25:29 GMT
On Fri, Oct 12, 2012 at 11:32 AM, Julian Hyde <julianhyde@gmail.com> wrote:

> For those implementing parsing & validation of the query language. Please
> let me share my hard-earned wisdom...
> 1. Separate parsing and validation. The parser should do the absolute
> minimum of validation. Don't try to validate identifiers. Don't do any
> type-checking. It will make errors better ('This function needs a boolean
> parameter' versus 'Expecting "true" or "false" or "<token> and" or 101
> other possibilities'.) And allows the parser to stay focused on one task
> which is difficult enough: converting text into a parse tree.

Completely agree. I call it parser stage and semantic analysis stage and
they must not be interleaved. Semantic analysis must start only after
complete query is parsed. Moreover, I have hard time separating semantic
validation logic from semantic analysis logic. So I decided that parser
will only parse and not bother to do checks like resolving identifiers.
Even if it will do, during semantic analysis it is always possible that
some subtle new errors with the query structure will be detected. So let's
let assign parsing to parser and semantic validation to semantic analyzer
which is completely separated from parser.

Particularly parser will not differentiate between built-in functions and
custom functions. So parser will not "know" about some reserved keywords of
DrQL and I think it is good so.

In other words, in modern XML/JSON terms :) I would say that parser must
check for "well-formed-ness" of the DrQL and semantic analyser for the
schema validation.

> 2. During the validation phase, do not modify the parse tree. If you need
> to annotate each node with a type, put it into a map from parse tree node
> -> type, not into a field in each node. Put any state you need (e.g. scope
> for resolving identifiers) into a temporary state that exists only during
> validation (think of the visitor pattern). And definitely do not do any
> tree-surgery. If you need to rewrite the tree, do it post validation. (In
> the planner, or just before planning, is a good time.) See
> http://en.wikipedia.org/wiki/Immutable_object.

Well, I understand the point here. However, I still think it worth putting
all the work of converting parse-tree to AST on the ANTLR shoulders saving
us a this chunk of logic altogether. The price to pay is a bit cryptic
error messages when DrQL is not even parsable or is not "well-formed"  if
you like that term more. If the DrQL would be a stable language following
some standard then I would back the approach of hand-coded parse-tree =>
AST conversion. However, DrQL syntax most probably will be very evolving to
say at least so why spend time to hand-code parser-tree => AST conversion
when it will be outdated in a few weeks?

> Julian
> On Oct 12, 2012, at 10:34 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> > Great comments.
> >
> > One particular high-level comment that Julian made is a criticism that I
> > have made in the past of other projects.  It is probably good for my
> > character to be on the receiving side of this criticism for once.
> >
> > The question is why should we use/invent a new concrete syntax when JSON
> > would do just as well (I am dropping the XML part of the suggestion due
> to
> > known prejudices on this list).
> >
> > I don't have a good answer to this question.  It makes certain problems
> > quite a bit easier.  Moreover, I have said in the past that it is nuts to
> > re-invent concrete syntax for config files and extension languages like
> > this.
> >
> > My course going forward is that I think I will put down both syntaxes and
> > let folks form their own opinion.  Using JSON will definitely move things
> > ahead more quickly since other folks have done the parser for us.
> >
> > On Fri, Oct 12, 2012 at 12:05 AM, Julian Hyde <julianhyde@gmail.com>
> wrote:
> >
> >> Ted,
> >>
> >> Great start. I've made some comments on the doc.
> >>
> >> Julian
> >>
> >> On Oct 11, 2012, at 10:48 PM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
> >>
> >>> The design for the logical plan is coming together.  Anybody should be
> >> able
> >>> to get to the interim design document at
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
> >>>
> >>> You should also be able to see the discussion so far.  Many thanks to
> >>> Timothy Chen for kibitzing very well as I wrote.  His astute
> observations
> >>> and questions were critical.
> >>>
> >>> I have to go sleep now, but it would be great to see progress on this
> >> while
> >>> I sleep.  Remember that comments and questions are as valuable (or more
> >> so)
> >>> than text.  Remember also, this document has a complete history so we
> can
> >>> reconstruct it no matter what happens.
> >>>
> >>> I would particularly like eyes on this (if practical) from Camuel,
> Jason,
> >>> Gera and Julian Hyde.  They have had some very good thoughts about this
> >>> layer in the past and probably will spot several errors in what I have
> >>> written.
> >>>
> >>> The plan for this document as it stabilizes is to put it into the
> >> web-site
> >>> under the documentation area.  WE will probably want to do that before
> it
> >>> really is done to make sure that people can find it easily and to
> ensure
> >> a
> >>> checkpoint is in Apache-land.
> >>>
> >>> See y'all tomorrow.
> >>
> >>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message