calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Khai Tran <kht...@linkedin.com.INVALID>
Subject Re: Piglet update
Date Mon, 13 Feb 2017 21:24:09 GMT
>
> If the Grunt parser is working better for you, we could consider using
> that instead of my hand-rolled parser. (But hopefully we can keep the
> VALUES operator; it’s a shame Pig latin doesn’t have this.)
>

Yes, Pig latin does not have the VALUES operator, but I think VALUES is
only useful for the learning purpose?

Initially, I considered expanding your hand-rolled parser, but then
realized the actual Pig scripts were much more complicated than the parser
could handle, especially with Pig UDFs. I think the main reason that people
are still using Pig is because of the flexible UDFs. With Grunt parser, my
job left is just to do Pig schema -> Calcite schema, Pig expressions ->
Calcite expressions, and Pig logical operators -> Calcite logical operators
conversions. Still a few things still missing from Calcite standard
operator like multi-set (Pig bag) projections (project any group of columns
from a multiset of rows), which I had to implement by myself. After that,
Pig latin is just really a subset of language covered by Calcite.

Yes, there are a number of issues outstanding. There is actually a test
> case for nested FOREACH in PigletTest.java, disabled. I figured we can work
> on these when people start using Piglet for real work. Log JIRA cases for
> the missing features and we can discuss how they could be implemented.
>
>
Implementing Pig nested FOREACH and FLATTEN is a bit tricky. I used
UNCOLLECT and LATERAL JOIN for converting the Pig inner logical plan inside
nested FOREACH.


> Are you aware of the work Eli Levine is doing on the Pig Adapter[1]? (It’s
> the opposite of Piglet — Pig on the bottom, rather than on the top — but it
> proves there is interest relating to Calcite-Pig integration.)
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-1598 <
> https://issues.apache.org/jira/browse/CALCITE-1598>
>
>
>
Thanks for your pointer. I'm new to the community and would like to learn
more about what is going like that. Pig Adapter sounds interesting, but it
may not be useful for our use case.

In our use case, we want to migrate Pig into another execution engine like
Spark/Presto. Calcite can be served as an intermediate representation to
decouple the language from execution engines. BTW, are we gonna have
support for SparkSQL and Presto in DatabaseProduct/SqlDialect?




>
> > On Feb 13, 2017, at 12:16 PM, Khai Tran <khtran@linkedin.com.INVALID>
> wrote:
> >
> > Hi all,
> >
> > I just want to check if anyone is working on Piglet. I've not seen any
> > commits on the source for awhile.
> >
> > I had ~3K LOC for converting Pig scripts into Calcite logical plans. My
> > approach is different from the one in Piglet. I used Grunt Parser from
> Pig
> > to parse Pig scripts and then write the code to convert Pig logical plans
> > into Calcite logical plans. So technically, we can translating any Pig
> > scripts into Calcite plans (and then into SQL).
> >
> > The interesting points about the conversion are about handling Pig UDFs,
> > Pig flexible schemas. and converting Pig nested foreach operators
> >
> > The code will be tested in LinkedIn production. Just want to check if
> there
> > are interests in the community so that I can ask for permission to public
> > the code.
> >
> > Thanks,
> > Khai
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message