calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Mior <mm...@uwaterloo.ca>
Subject Re: Cosette / Apache Calcite
Date Thu, 08 Feb 2018 23:25:05 GMT
Shumo and Alvin,

That's great to hear (especially the part about the lack of bugs)! That
paper is next on my stack to read. I'd love to work on getting these tests
better integrated with Calcite. It would be really helpful if you could
provide more detail on how you're currently running the tests. Ideally if
you could provide a way for me to easily replicate your test
infrastructure, that would probably help give me a better idea on how we
can help with the integration.

Cheers,
--
Michael Mior
mmior@apache.org

2018-02-08 17:56 GMT-05:00 Shumo Chu <shumo.chu@gmail.com>:

> Hi, Calcite Devs!
>
> We wanted to update you on the current status of using Cosette to help
> check the correctness of the Calcite planner rules [
> https://issues.apache.org/jira/browse/CALCITE-1977]:
>
> So far we have used Cosette to check whether Calcite's transformations are
> correct by using the test cases in RelOptRulesTest.java.
>
> For each test case, we use RelToSqlConverter to revert the query plan
> before Calcite's rewrite back to SQL (call it Q1) and similarly for the
> query plan after Calcite's rewrite (Q2). We then use Cosette to check
> whether Q1 and Q2 are semantically equivalent.
>
> As Calcite's test cases cover many SQL features, some of them are not
> supported by Cosette yet. Out of the 39 (give number) test cases that use
> SQL features supported by Cosette, Cosette is now able to formally prove
> that Calcite's rewrite in 33 of them are correct. This includes a few
> fairly complicated ones, like "testPushFilterPassAggThree" (
> https://github.com/apache/calcite/blob/master/core/src/test
> /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news is
> that we haven't found any bugs so far :)
>
> We have also used the test cases to improve Cosette. For example, Cosette
> now supports checking equivalence under database integrity constraints as
> preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> interested you can read about our latest system at
> https://arxiv.org/abs/1802.02229
>
> We would like to improve Cosette's parser so that we can handle the
> remaining Calcite test cases. Meanwhile, we would also like to discuss how
> to integrate the two tools in a more systematic way, perhaps as part of the
> Calcite test infrastructure rather than us manually extracting the test
> cases from RelOptRulesTest (RelToSqlConverter also doesn't always succeed
> in reversion as you may know). Anyone interested in discussing more? If so
> how should we proceed?
>
> Best
> Shumo & Alvin
>
> On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <shumo.chu@gmail.com> wrote:
>
> > I create a link to the image:
> >
> > https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
> >
> > In terms of integration, what I can imagine is to call Cosette web API in
> > parallel when running calcite test cases, like RelOptRulesTest, then
> > perhaps Cosette result can give developers either extra confidence or
> > counterexample if there is a bug.
> >
> >
> >
> > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mmior@uwaterloo.ca>
> wrote:
> >
> >> Shumo,
> >>
> >> Thanks for the update! Unfortunately I don't believe inline images and
> >> attachments are not supported on ASF mailing lists so if you have
> another
> >> way of sharing that, would be interested to see. Glad to hear the test
> >> cases have been useful to you. Also great to see that Calcite seems to
> be
> >> faring well so far :)
> >>
> >> You're right that the conversion of plans to SQL definitely has some
> >> missing pieces. If there are any that are particularly problematic in
> your
> >> with Cosette, please let us know. For any bugs you encounter, it would
> be
> >> great if you could file a JIRA so we can work on fixing these (
> >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> >>
> >> --
> >> Michael Mior
> >> mmior@apache.org
> >>
> >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <shumo.chu@gmail.com>:
> >>
> >> > Hi Michael and Calcite Devs:
> >> >
> >> > That's great!
> >> >
> >> > We also hijacked the test harness of RelOpRule Test and generated SQL
> >> > using before and after optimization logical plans. (The reason that I
> >> use
> >> > query plan rather than the input SQL is that when Calcite generates
> SQLs
> >> > from query plans, it adds the correct table reference of attributes.
> >> > Cosette currently cannot handle unqualified attribute references). We
> >> got
> >> > 232 before and after SQL after some tweaking. You can find the SQL
> >> queries
> >> > that we get here: https://github.com/uwdb/Cosett
> >> e/blob/master/examples/c
> >> > alcite/calcite_tests.json.
> >> >
> >> > We then ran all of them using Cosette. Cosette can currently support
> 39
> >> of
> >> > 232 queries. Out of these 39 queries, the results are:
> >> >
> >> > EQ           9
> >> > UNKNOWN     27
> >> > NEQ          3
> >> >
> >> > EQ means our Coq backend found a valid proof for the equivalence of
> the
> >> > two queries.
> >> >
> >> > UNKNOWN means our model checker could not find a counterexample to
> prove
> >> > that the queries are not equivalent (within a time bound of 3 secs).
> You
> >> > can interpret that as "likely equal".
> >> >
> >> > NEQ means our model checker found a counterexample to prove the
> queries
> >> > are not equivalent. These are all the cases that there are some
> >> > preconditions that Calcite understands but we haven't yet put into
> >> Cosette.
> >> > For example (testPushSemiJoinPastJoinRuleLeft):
> >> >
> >> >  query q1 `SELECT EMP.ENAME
> >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> >> >
> >> > query q2 `SELECT EMP1.ENAME
> >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN EMP AS
> EMP2
> >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT AS
> >> DEPT1
> >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN EMP AS
> EMP3
> >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> >> >
> >> >
> >> > These two queries are equivalent under the precondition that EMPNO is
> >> the
> >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and that is
> >> > indeed the case!
> >> >
> >> > We are extending Cosette to support preconditions such as primary keys
> >> and
> >> > foreign keys now and we should be able to handle these soon.
> >> >
> >> > For the queries that we cannot handle, here is the break down of the
> >> > issues:
> >> >
> >> > [image: Inline image 1]
> >> >
> >> > We support a large part of group by queries, GROUPBY here means some
> >> > grouping features that we don't support, e.g., group by on arbitrary
> >> > expressions. Although we might not be able to support all of these SQL
> >> > features, we are definitely working on adding more SQL features to
> >> Cosette.
> >> >
> >> > With all above, we can try to integrate Cosette as an extra layer to
> >> > ensure developer's confidence. Currently, we are working on SIGMOD
> >> deadline
> >> > (Nov. 2) but will devote more time into this afterward.
> >> >
> >> > One thing worth mentioning is that we plan to include all these
> calcite
> >> > examples as the regression test for Cosette so that we can make
> Cosette
> >> > more powerful and practical. These queries are great benchmarks for
> >> Cosette
> >> > and are super valuable!
> >> >
> >> > Meanwhile, It would be also great if calcite's logical plan to SQL
> >> > converter can be improved. It currently doesn't support all queries
> and
> >> in
> >> > some cases, it actually generates illegal SQL. (happy to submit
> detailed
> >> > bug report if you think that would be helpful).
> >> >
> >> > Best
> >> > Shumo
> >> >
> >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mmior@uwaterloo.ca>
> >> wrote:
> >> >
> >> >> This took me longer than expected to get around to, but hopefully the
> >> >> below
> >> >> is helpful:
> >> >>
> >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> >> >>
> >> >> I just did some basic (and very hacky) instrumentation of
> >> RelOptRulesTest
> >> >> to dump SQL before and after rules have been applied. The file
> >> consists of
> >> >> the name of the test followed by the original and then the rewritten
> >> SQL.
> >> >>
> >> >> Many of the tests are missing for various reasons, but there's still
> >> 189
> >> >> examples there to play with. Let me know if any particular aspects
of
> >> the
> >> >> SQL are problematic. The "before" SQL is handwritten for the tests
> and
> >> the
> >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> logical
> >> >> plan.
> >> >>
> >> >> --
> >> >> Michael Mior
> >> >> mmior@apache.org
> >> >>
> >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <amoghm@qubole.com>:
> >> >>
> >> >> > >>> There might be applications in materialized views.
A query Q
> can
> >> >> use a
> >> >> > materialized view V if V covers Q. In other words if >>>Q
== R(V)
> >> where
> >> >> R
> >> >> > is some sequence of relational operators. Given Q and V, Cosette
> >> could
> >> >> > perhaps analyze and either >>>return R (success) or return
that V
> >> does
> >> >> not
> >> >> > cover Q (failure).
> >> >> >
> >> >> > >>This resembles the problem of deciding whether a given
relation
> >> >> > (expressed as a query) is contained in another one. It will >>take
> >> some
> >> >> > work for Cosette to be able to handle this but it definitely sounds
> >> >> > interesting. Do you have an application in mind? >>One of
them
> might
> >> be
> >> >> to
> >> >> > determine whether previously cached results can be used.
> >> >> >
> >> >> > One simple idea to start here is to replace a naive solver we
have
> in
> >> >> > Calcite for checking if one predicate implies another predicate.
We
> >> >> call it
> >> >> > RexImplicationChecker in Calcite and if we can replace or help
it
> >> with
> >> >> > Constraint solver of Cosette which says if a particular implication
> >> is a
> >> >> > tautology then that would help a great deal.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> >> >> akcheung@cs.washington.edu>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Julian et al,
> >> >> > >
> >> >> > > Thanks for your interest in Cosette. Your suggestions make
a lot
> of
> >> >> > sense.
> >> >> > > We have done some initial work and would like to get your
> feedback
> >> on
> >> >> how
> >> >> > > to integrate the two tools together.
> >> >> > >
> >> >> > > > One obvious idea is to use Cosette to audit Calcite’s
query
> >> >> > > transformation rules. Each rule is supposed to preserve semantics
> >> but
> >> >> > > (until Cosette) we had to trust the author of the rule. We
could
> >> >> convert
> >> >> > > the before and after relational expressions to SQL, and then
ask
> >> >> Cosette
> >> >> > > whether those are equivalent. We could enable this check
in
> >> Calcite’s
> >> >> > test
> >> >> > > suite, during which many thousands of rules are fired.
> >> >> > >
> >> >> > > Indeed. We have browsed through the Calcite rules and
> reformulated
> >> a
> >> >> few
> >> >> > > of them using our Cosette language:
> >> >> > >
> >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/FilterMergeRule.java) --> https://demo.cosette.cs.washin
> >> >> gton.edu/
> >> >> > > (click conjunctive select from the dropdown menu)
> >> >> > >
> >> >> > > 2. Join commute (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/JoinCommuteRule.java) --> Join commute from the
demo
> website
> >> >> above
> >> >> > >
> >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> >> >> > > rules/JoinProjectTransposeRule.java) --> Join Proj. Trans.
from
> >> the
> >> >> demo
> >> >> > > website above
> >> >> > >
> >> >> > > As we are not very familiar with the Calcite code base, we
have
> >> tried
> >> >> our
> >> >> > > best to guess the intention of each rule based on the
> >> documentation,
> >> >> > please
> >> >> > > feel free to point out if we made mistakes.
> >> >> > >
> >> >> > > As you can see, the Cosette language is pretty much like
standard
> >> SQL,
> >> >> > > except for declarations of schemas and relations. You will
also
> >> notice
> >> >> > the
> >> >> > > "??" in some schema declarations (e.g., in rule 1. above)
---
> they
> >> >> stand
> >> >> > > for "symbolic" attributes that can represent any attribute.
In
> >> other
> >> >> > words,
> >> >> > > if Cosette can prove that a rule with symbolic attributes
is
> true,
> >> >> then
> >> >> > it
> >> >> > > will be true regardless of what the symbolic attributes are
> >> >> instantiated
> >> >> > > with. Symbolic predicates (e.g., in rule 1.) works similarly,
> hence
> >> >> > giving
> >> >> > > Cosette a mechanism to prove (or disprove) classes of rewrite
> >> rules.
> >> >> See
> >> >> > > our documentation at http://cosette.cs.washington.edu/guide
for
> >> >> details.
> >> >> > >
> >> >> > > I believe the challenge here is how we can "reverse engineer"
the
> >> >> > > intention of each of the existing rules so that they can
be
> >> expressed
> >> >> in
> >> >> > > Cosette. Any suggestions on how to do this? We have a few
> students
> >> >> > working
> >> >> > > on Cosette and can help, but we will probably need help from
> >> Calcite
> >> >> to
> >> >> > > fully understand all of the existing rules. Another possibility
> is
> >> to
> >> >> > print
> >> >> > > out the input and output of each rule application during
testing,
> >> and
> >> >> > send
> >> >> > > them to Cosette. If the printout is in a form that resembles
SQL
> we
> >> >> can
> >> >> > > probably patch it into Cosette.
> >> >> > >
> >> >> > > For new rules, can we can ask Calcite authors to express
them in
> >> >> Cosette
> >> >> > > as well, perhaps as part of the documentation? This way we
will
> >> only
> >> >> need
> >> >> > > to handle the existing rules.
> >> >> > >
> >> >> > > > A few rules might use other information besides the
input
> >> relational
> >> >> > > expression, such as predicates that are known to hold or
column
> >> >> > > combinations that are known to be unique. But let’s see
what
> >> happens.
> >> >> > >
> >> >> > > This is something that we are actively working on. Can you
point
> >> us to
> >> >> > > specific rules with such properties? One possibility is the
join
> >> >> > > commutativity rule noted above. You will notice that we didn't
> >> prove
> >> >> the
> >> >> > > "general form" of the rule with symbolic attributes, but
rather
> one
> >> >> with
> >> >> > > concrete schemas. This is because Cosette currently implements
> the
> >> >> > unnamed
> >> >> > > approach to attribute naming (see Section 3.2 in
> >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf), hence the
> >> general
> >> >> form
> >> >> > > of the rule is only true if we know that the two input schemas
> have
> >> >> > > distinct attributes.
> >> >> > >
> >> >> > > > This is a very loose integration of Cosette / Calcite,
but we
> can
> >> >> make
> >> >> > > closer integrations (e.g. within the same JVM, even at runtime)
> as
> >> we
> >> >> > > discover synergies. After all, optimization and theorem-proving
> are
> >> >> > related
> >> >> > > endeavors.
> >> >> > >
> >> >> > > Agreed. Cosette is implemented using Coq and Racket. We realize
> >> that
> >> >> > those
> >> >> > > are not the most popular languages for implementing systems
:) ,
> so
> >> >> > Cosette
> >> >> > > comes with a POST API as well: http://cosette.cs.washington.
> >> >> > edu/guide#api
> >> >> > > . It takes in the program text written in Cosette, and returns
> the
> >> >> answer
> >> >> > > (or times out). Does this make it easier to run the tool?
We are
> >> open
> >> >> to
> >> >> > > implementing other bindings as well.
> >> >> > >
> >> >> > > > Another area that would be useful would be to devise
test data.
> >> >> > >
> >> >> > > How about this: Each SQL implementation has its own
> interpretation
> >> of
> >> >> > SQL,
> >> >> > > with Cosette being one of them. Let's implement different
SQL
> >> >> semantics
> >> >> > > using Cosette (say, Calcite's and Postgres'). Then, given
a
> query,
> >> ask
> >> >> > > Cosette to find a counterexample (i.e., an input relation)
where
> >> the
> >> >> two
> >> >> > > implementations will return different results when executed
on a
> >> given
> >> >> > > query. If such a counterexample exists, then Calcite developers
> can
> >> >> > > determine whether this is a "bug" or a "feature". Does this
sound
> >> >> similar
> >> >> > > to what you have in mind?
> >> >> > >
> >> >> > > > There might be applications in materialized views. A
query Q
> can
> >> >> use a
> >> >> > > materialized view V if V covers Q. In other words if Q ==
R(V)
> >> where
> >> >> R is
> >> >> > > some sequence of relational operators. Given Q and V, Cosette
> could
> >> >> > perhaps
> >> >> > > analyze and either return R (success) or return that V does
not
> >> cover
> >> >> Q
> >> >> > > (failure).
> >> >> > >
> >> >> > > This resembles the problem of deciding whether a given relation
> >> >> > (expressed
> >> >> > > as a query) is contained in another one. It will take some
work
> for
> >> >> > Cosette
> >> >> > > to be able to handle this but it definitely sounds interesting.
> Do
> >> you
> >> >> > have
> >> >> > > an application in mind? One of them might be to determine
whether
> >> >> > > previously cached results can be used.
> >> >> > >
> >> >> > > We definitely see lots of synergies between the two tools.
To
> start
> >> >> with
> >> >> > > something easy :) , I propose we first discuss how to use
the
> >> current
> >> >> > > Cosette implementation to audit existing Calcite rules, and
a way
> >> to
> >> >> > > integrate Cosette into development of future Calcite rules
as
> part
> >> of
> >> >> > code
> >> >> > > review / regression tests. What do you think?
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Alvin (on behalf of the Cosette team)
> >> >> > >
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > shumochu.com
> >> >
> >>
> >
> >
> >
> > --
> > shumochu.com
> >
>
>
>
> --
> shumochu.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message