calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Mior <mm...@uwaterloo.ca>
Subject Re: Cosette / Apache Calcite
Date Fri, 09 Feb 2018 00:37:06 GMT
Shumo,

Thanks for the additional info. I think it would be great to have these
tests run as part of the Calcite test suite. One question I have is what
the runtime is like for these tests. Especially since using the web API
would result in a dependency on a service we (as in the Calcite PMC) don't
control, I would be inclined to have these tests run separately so they can
be turned off if needed. Per an earlier comment from Julian (
https://issues.apache.org/jira/browse/CALCITE-1977) one option would be to
allow Calcite to generate a script that can be fed to Cosette.

Also, while I'm sure I could reproduce something like the test suite you
currently have in time, if you have code you could share for your current
test setup with Cosette and Calcite that would likely be helpful in moving
this along.

Cheers,
--
Michael Mior
mmior@uwaterloo.ca

2018-02-08 19:13 GMT-05:00 Shumo Chu <shumo.chu@gmail.com>:

> Currently, I hijack the test cases and call RelToSqlConverter before and
> after calcite's rewrite. When I get the query before (Q1) and after (Q2), I
> can then generate Cosette source code, need to add a few things, e.g.
> schema declarations. This Cosette source code is then sent to our solver to
> get the result of equivalence checking.
>
> Now everything is run on my own computer. But Cosette does provide a web
> service that is running on a UW CSE server and has a web API to call (
> http://cosette.cs.washington.edu/guide#api). So calcite can either call
> the
> web API (the advantage is no need to install any dependencies) or try to
> run Cosette locally (we have dockerized the installation of Cosette).
>
> What do you think?
>
> Best
> Shumo
>
> On Thu, Feb 8, 2018 at 3:25 PM, Michael Mior <mmior@uwaterloo.ca> wrote:
>
> > Shumo and Alvin,
> >
> > That's great to hear (especially the part about the lack of bugs)! That
> > paper is next on my stack to read. I'd love to work on getting these
> tests
> > better integrated with Calcite. It would be really helpful if you could
> > provide more detail on how you're currently running the tests. Ideally if
> > you could provide a way for me to easily replicate your test
> > infrastructure, that would probably help give me a better idea on how we
> > can help with the integration.
> >
> > Cheers,
> > --
> > Michael Mior
> > mmior@apache.org
> >
> > 2018-02-08 17:56 GMT-05:00 Shumo Chu <shumo.chu@gmail.com>:
> >
> > > Hi, Calcite Devs!
> > >
> > > We wanted to update you on the current status of using Cosette to help
> > > check the correctness of the Calcite planner rules [
> > > https://issues.apache.org/jira/browse/CALCITE-1977]:
> > >
> > > So far we have used Cosette to check whether Calcite's transformations
> > are
> > > correct by using the test cases in RelOptRulesTest.java.
> > >
> > > For each test case, we use RelToSqlConverter to revert the query plan
> > > before Calcite's rewrite back to SQL (call it Q1) and similarly for the
> > > query plan after Calcite's rewrite (Q2). We then use Cosette to check
> > > whether Q1 and Q2 are semantically equivalent.
> > >
> > > As Calcite's test cases cover many SQL features, some of them are not
> > > supported by Cosette yet. Out of the 39 (give number) test cases that
> use
> > > SQL features supported by Cosette, Cosette is now able to formally
> prove
> > > that Calcite's rewrite in 33 of them are correct. This includes a few
> > > fairly complicated ones, like "testPushFilterPassAggThree" (
> > > https://github.com/apache/calcite/blob/master/core/src/test
> > > /java/org/apache/calcite/test/RelOptRulesTest.java#L408) The good news
> > is
> > > that we haven't found any bugs so far :)
> > >
> > > We have also used the test cases to improve Cosette. For example,
> Cosette
> > > now supports checking equivalence under database integrity constraints
> as
> > > preconditions, e.g., Q1 == Q2 iff R.A is a key of R. In case you are
> > > interested you can read about our latest system at
> > > https://arxiv.org/abs/1802.02229
> > >
> > > We would like to improve Cosette's parser so that we can handle the
> > > remaining Calcite test cases. Meanwhile, we would also like to discuss
> > how
> > > to integrate the two tools in a more systematic way, perhaps as part of
> > the
> > > Calcite test infrastructure rather than us manually extracting the test
> > > cases from RelOptRulesTest (RelToSqlConverter also doesn't always
> succeed
> > > in reversion as you may know). Anyone interested in discussing more? If
> > so
> > > how should we proceed?
> > >
> > > Best
> > > Shumo & Alvin
> > >
> > > On Tue, Oct 17, 2017 at 6:36 PM, Shumo Chu <shumo.chu@gmail.com>
> wrote:
> > >
> > > > I create a link to the image:
> > > >
> > > > https://homes.cs.washington.edu/~chushumo/img/calcite_break_down.png
> > > >
> > > > In terms of integration, what I can imagine is to call Cosette web
> API
> > in
> > > > parallel when running calcite test cases, like RelOptRulesTest, then
> > > > perhaps Cosette result can give developers either extra confidence or
> > > > counterexample if there is a bug.
> > > >
> > > >
> > > >
> > > > On Tue, Oct 17, 2017 at 5:49 PM, Michael Mior <mmior@uwaterloo.ca>
> > > wrote:
> > > >
> > > >> Shumo,
> > > >>
> > > >> Thanks for the update! Unfortunately I don't believe inline images
> and
> > > >> attachments are not supported on ASF mailing lists so if you have
> > > another
> > > >> way of sharing that, would be interested to see. Glad to hear the
> test
> > > >> cases have been useful to you. Also great to see that Calcite seems
> to
> > > be
> > > >> faring well so far :)
> > > >>
> > > >> You're right that the conversion of plans to SQL definitely has some
> > > >> missing pieces. If there are any that are particularly problematic
> in
> > > your
> > > >> with Cosette, please let us know. For any bugs you encounter, it
> would
> > > be
> > > >> great if you could file a JIRA so we can work on fixing these (
> > > >> https://issues.apache.org/jira/projects/CALCITE/issues/).
> > > >>
> > > >> --
> > > >> Michael Mior
> > > >> mmior@apache.org
> > > >>
> > > >> 2017-10-17 20:06 GMT-04:00 Shumo Chu <shumo.chu@gmail.com>:
> > > >>
> > > >> > Hi Michael and Calcite Devs:
> > > >> >
> > > >> > That's great!
> > > >> >
> > > >> > We also hijacked the test harness of RelOpRule Test and generated
> > SQL
> > > >> > using before and after optimization logical plans. (The reason
> that
> > I
> > > >> use
> > > >> > query plan rather than the input SQL is that when Calcite
> generates
> > > SQLs
> > > >> > from query plans, it adds the correct table reference of
> attributes.
> > > >> > Cosette currently cannot handle unqualified attribute references).
> > We
> > > >> got
> > > >> > 232 before and after SQL after some tweaking. You can find the
SQL
> > > >> queries
> > > >> > that we get here: https://github.com/uwdb/Cosett
> > > >> e/blob/master/examples/c
> > > >> > alcite/calcite_tests.json.
> > > >> >
> > > >> > We then ran all of them using Cosette. Cosette can currently
> support
> > > 39
> > > >> of
> > > >> > 232 queries. Out of these 39 queries, the results are:
> > > >> >
> > > >> > EQ           9
> > > >> > UNKNOWN     27
> > > >> > NEQ          3
> > > >> >
> > > >> > EQ means our Coq backend found a valid proof for the equivalence
> of
> > > the
> > > >> > two queries.
> > > >> >
> > > >> > UNKNOWN means our model checker could not find a counterexample
to
> > > prove
> > > >> > that the queries are not equivalent (within a time bound of 3
> secs).
> > > You
> > > >> > can interpret that as "likely equal".
> > > >> >
> > > >> > NEQ means our model checker found a counterexample to prove the
> > > queries
> > > >> > are not equivalent. These are all the cases that there are some
> > > >> > preconditions that Calcite understands but we haven't yet put
into
> > > >> Cosette.
> > > >> > For example (testPushSemiJoinPastJoinRuleLeft):
> > > >> >
> > > >> >  query q1 `SELECT EMP.ENAME
> > > >> >                   FROM EMP AS EMP, DEPT AS DEPT, EMP AS EMP0
> > > >> >                    WHERE EMP.DEPTNO = DEPT.DEPTNO AND
> > > >> >                                  EMP.EMPNO = EMP0.EMPNO`;
> > > >> >
> > > >> > query q2 `SELECT EMP1.ENAME
> > > >> >                  FROM EMP AS EMP1 INNER JOIN DEPT AS DEPT0
> > > >> >                     ON EMP1.DEPTNO = DEPT0.DEPTNO INNER JOIN
EMP
> AS
> > > EMP2
> > > >> >                       ON EMP1.EMPNO = EMP2.EMPNO INNER JOIN DEPT
> AS
> > > >> DEPT1
> > > >> >                     ON EMP1.DEPTNO = DEPT1.DEPTNO INNER JOIN
EMP
> AS
> > > EMP3
> > > >> >                     ON EMP1.EMPNO = EMP3.EMPNO`;
> > > >> >
> > > >> >
> > > >> > These two queries are equivalent under the precondition that
EMPNO
> > is
> > > >> the
> > > >> > primary key of EMP, and DEPTNO is the primary key of DEPT, and
> that
> > is
> > > >> > indeed the case!
> > > >> >
> > > >> > We are extending Cosette to support preconditions such as primary
> > keys
> > > >> and
> > > >> > foreign keys now and we should be able to handle these soon.
> > > >> >
> > > >> > For the queries that we cannot handle, here is the break down
of
> the
> > > >> > issues:
> > > >> >
> > > >> > [image: Inline image 1]
> > > >> >
> > > >> > We support a large part of group by queries, GROUPBY here means
> some
> > > >> > grouping features that we don't support, e.g., group by on
> arbitrary
> > > >> > expressions. Although we might not be able to support all of
these
> > SQL
> > > >> > features, we are definitely working on adding more SQL features
to
> > > >> Cosette.
> > > >> >
> > > >> > With all above, we can try to integrate Cosette as an extra layer
> to
> > > >> > ensure developer's confidence. Currently, we are working on SIGMOD
> > > >> deadline
> > > >> > (Nov. 2) but will devote more time into this afterward.
> > > >> >
> > > >> > One thing worth mentioning is that we plan to include all these
> > > calcite
> > > >> > examples as the regression test for Cosette so that we can make
> > > Cosette
> > > >> > more powerful and practical. These queries are great benchmarks
> for
> > > >> Cosette
> > > >> > and are super valuable!
> > > >> >
> > > >> > Meanwhile, It would be also great if calcite's logical plan to
SQL
> > > >> > converter can be improved. It currently doesn't support all
> queries
> > > and
> > > >> in
> > > >> > some cases, it actually generates illegal SQL. (happy to submit
> > > detailed
> > > >> > bug report if you think that would be helpful).
> > > >> >
> > > >> > Best
> > > >> > Shumo
> > > >> >
> > > >> > On Tue, Oct 17, 2017 at 9:43 AM, Michael Mior <mmior@uwaterloo.ca
> >
> > > >> wrote:
> > > >> >
> > > >> >> This took me longer than expected to get around to, but hopefully
> > the
> > > >> >> below
> > > >> >> is helpful:
> > > >> >>
> > > >> >> https://gist.github.com/fd2d2db412c11ec4d901925548f85ef2
> > > >> >>
> > > >> >> I just did some basic (and very hacky) instrumentation of
> > > >> RelOptRulesTest
> > > >> >> to dump SQL before and after rules have been applied. The
file
> > > >> consists of
> > > >> >> the name of the test followed by the original and then the
> > rewritten
> > > >> SQL.
> > > >> >>
> > > >> >> Many of the tests are missing for various reasons, but there's
> > still
> > > >> 189
> > > >> >> examples there to play with. Let me know if any particular
> aspects
> > of
> > > >> the
> > > >> >> SQL are problematic. The "before" SQL is handwritten for
the
> tests
> > > and
> > > >> the
> > > >> >> "after" is ANSI SQL as generated by Calcite from the resulting
> > > logical
> > > >> >> plan.
> > > >> >>
> > > >> >> --
> > > >> >> Michael Mior
> > > >> >> mmior@apache.org
> > > >> >>
> > > >> >> 2017-09-22 1:03 GMT-04:00 Amogh Margoor <amoghm@qubole.com>:
> > > >> >>
> > > >> >> > >>> There might be applications in materialized
views. A query
> Q
> > > can
> > > >> >> use a
> > > >> >> > materialized view V if V covers Q. In other words if
>>>Q ==
> R(V)
> > > >> where
> > > >> >> R
> > > >> >> > is some sequence of relational operators. Given Q and
V,
> Cosette
> > > >> could
> > > >> >> > perhaps analyze and either >>>return R (success)
or return
> that V
> > > >> does
> > > >> >> not
> > > >> >> > cover Q (failure).
> > > >> >> >
> > > >> >> > >>This resembles the problem of deciding whether
a given
> relation
> > > >> >> > (expressed as a query) is contained in another one.
It will
> > >>take
> > > >> some
> > > >> >> > work for Cosette to be able to handle this but it definitely
> > sounds
> > > >> >> > interesting. Do you have an application in mind? >>One
of them
> > > might
> > > >> be
> > > >> >> to
> > > >> >> > determine whether previously cached results can be used.
> > > >> >> >
> > > >> >> > One simple idea to start here is to replace a naive
solver we
> > have
> > > in
> > > >> >> > Calcite for checking if one predicate implies another
> predicate.
> > We
> > > >> >> call it
> > > >> >> > RexImplicationChecker in Calcite and if we can replace
or help
> it
> > > >> with
> > > >> >> > Constraint solver of Cosette which says if a particular
> > implication
> > > >> is a
> > > >> >> > tautology then that would help a great deal.
> > > >> >> >
> > > >> >> >
> > > >> >> >
> > > >> >> > On Tue, Sep 19, 2017 at 9:29 PM, Alvin Cheung <
> > > >> >> akcheung@cs.washington.edu>
> > > >> >> > wrote:
> > > >> >> >
> > > >> >> > > Hi Julian et al,
> > > >> >> > >
> > > >> >> > > Thanks for your interest in Cosette. Your suggestions
make a
> > lot
> > > of
> > > >> >> > sense.
> > > >> >> > > We have done some initial work and would like to
get your
> > > feedback
> > > >> on
> > > >> >> how
> > > >> >> > > to integrate the two tools together.
> > > >> >> > >
> > > >> >> > > > One obvious idea is to use Cosette to audit
Calcite’s query
> > > >> >> > > transformation rules. Each rule is supposed to
preserve
> > semantics
> > > >> but
> > > >> >> > > (until Cosette) we had to trust the author of the
rule. We
> > could
> > > >> >> convert
> > > >> >> > > the before and after relational expressions to
SQL, and then
> > ask
> > > >> >> Cosette
> > > >> >> > > whether those are equivalent. We could enable this
check in
> > > >> Calcite’s
> > > >> >> > test
> > > >> >> > > suite, during which many thousands of rules are
fired.
> > > >> >> > >
> > > >> >> > > Indeed. We have browsed through the Calcite rules
and
> > > reformulated
> > > >> a
> > > >> >> few
> > > >> >> > > of them using our Cosette language:
> > > >> >> > >
> > > >> >> > > 1. Conjunctive select (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/FilterMergeRule.java) -->
> https://demo.cosette.cs.washin
> > > >> >> gton.edu/
> > > >> >> > > (click conjunctive select from the dropdown menu)
> > > >> >> > >
> > > >> >> > > 2. Join commute (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/JoinCommuteRule.java) --> Join commute
from the demo
> > > website
> > > >> >> above
> > > >> >> > >
> > > >> >> > > 3. Join/Project transpose (https://github.com/apache/cal
> > > >> >> > > cite/blob/master/core/src/main/java/org/apache/calcite/rel/
> > > >> >> > > rules/JoinProjectTransposeRule.java) --> Join
Proj. Trans.
> > from
> > > >> the
> > > >> >> demo
> > > >> >> > > website above
> > > >> >> > >
> > > >> >> > > As we are not very familiar with the Calcite code
base, we
> have
> > > >> tried
> > > >> >> our
> > > >> >> > > best to guess the intention of each rule based
on the
> > > >> documentation,
> > > >> >> > please
> > > >> >> > > feel free to point out if we made mistakes.
> > > >> >> > >
> > > >> >> > > As you can see, the Cosette language is pretty
much like
> > standard
> > > >> SQL,
> > > >> >> > > except for declarations of schemas and relations.
You will
> also
> > > >> notice
> > > >> >> > the
> > > >> >> > > "??" in some schema declarations (e.g., in rule
1. above) ---
> > > they
> > > >> >> stand
> > > >> >> > > for "symbolic" attributes that can represent any
attribute.
> In
> > > >> other
> > > >> >> > words,
> > > >> >> > > if Cosette can prove that a rule with symbolic
attributes is
> > > true,
> > > >> >> then
> > > >> >> > it
> > > >> >> > > will be true regardless of what the symbolic attributes
are
> > > >> >> instantiated
> > > >> >> > > with. Symbolic predicates (e.g., in rule 1.) works
similarly,
> > > hence
> > > >> >> > giving
> > > >> >> > > Cosette a mechanism to prove (or disprove) classes
of rewrite
> > > >> rules.
> > > >> >> See
> > > >> >> > > our documentation at http://cosette.cs.washington.edu/guide
> > for
> > > >> >> details.
> > > >> >> > >
> > > >> >> > > I believe the challenge here is how we can "reverse
engineer"
> > the
> > > >> >> > > intention of each of the existing rules so that
they can be
> > > >> expressed
> > > >> >> in
> > > >> >> > > Cosette. Any suggestions on how to do this? We
have a few
> > > students
> > > >> >> > working
> > > >> >> > > on Cosette and can help, but we will probably need
help from
> > > >> Calcite
> > > >> >> to
> > > >> >> > > fully understand all of the existing rules. Another
> possibility
> > > is
> > > >> to
> > > >> >> > print
> > > >> >> > > out the input and output of each rule application
during
> > testing,
> > > >> and
> > > >> >> > send
> > > >> >> > > them to Cosette. If the printout is in a form that
resembles
> > SQL
> > > we
> > > >> >> can
> > > >> >> > > probably patch it into Cosette.
> > > >> >> > >
> > > >> >> > > For new rules, can we can ask Calcite authors to
express them
> > in
> > > >> >> Cosette
> > > >> >> > > as well, perhaps as part of the documentation?
This way we
> will
> > > >> only
> > > >> >> need
> > > >> >> > > to handle the existing rules.
> > > >> >> > >
> > > >> >> > > > A few rules might use other information besides
the input
> > > >> relational
> > > >> >> > > expression, such as predicates that are known to
hold or
> column
> > > >> >> > > combinations that are known to be unique. But let’s
see what
> > > >> happens.
> > > >> >> > >
> > > >> >> > > This is something that we are actively working
on. Can you
> > point
> > > >> us to
> > > >> >> > > specific rules with such properties? One possibility
is the
> > join
> > > >> >> > > commutativity rule noted above. You will notice
that we
> didn't
> > > >> prove
> > > >> >> the
> > > >> >> > > "general form" of the rule with symbolic attributes,
but
> rather
> > > one
> > > >> >> with
> > > >> >> > > concrete schemas. This is because Cosette currently
> implements
> > > the
> > > >> >> > unnamed
> > > >> >> > > approach to attribute naming (see Section 3.2 in
> > > >> >> > > http://webdam.inria.fr/Alice/pdfs/Chapter-3.pdf),
hence the
> > > >> general
> > > >> >> form
> > > >> >> > > of the rule is only true if we know that the two
input
> schemas
> > > have
> > > >> >> > > distinct attributes.
> > > >> >> > >
> > > >> >> > > > This is a very loose integration of Cosette
/ Calcite, but
> we
> > > can
> > > >> >> make
> > > >> >> > > closer integrations (e.g. within the same JVM,
even at
> runtime)
> > > as
> > > >> we
> > > >> >> > > discover synergies. After all, optimization and
> theorem-proving
> > > are
> > > >> >> > related
> > > >> >> > > endeavors.
> > > >> >> > >
> > > >> >> > > Agreed. Cosette is implemented using Coq and Racket.
We
> realize
> > > >> that
> > > >> >> > those
> > > >> >> > > are not the most popular languages for implementing
systems
> :)
> > ,
> > > so
> > > >> >> > Cosette
> > > >> >> > > comes with a POST API as well: http://cosette.cs.washington.
> > > >> >> > edu/guide#api
> > > >> >> > > . It takes in the program text written in Cosette,
and
> returns
> > > the
> > > >> >> answer
> > > >> >> > > (or times out). Does this make it easier to run
the tool? We
> > are
> > > >> open
> > > >> >> to
> > > >> >> > > implementing other bindings as well.
> > > >> >> > >
> > > >> >> > > > Another area that would be useful would be
to devise test
> > data.
> > > >> >> > >
> > > >> >> > > How about this: Each SQL implementation has its
own
> > > interpretation
> > > >> of
> > > >> >> > SQL,
> > > >> >> > > with Cosette being one of them. Let's implement
different SQL
> > > >> >> semantics
> > > >> >> > > using Cosette (say, Calcite's and Postgres'). Then,
given a
> > > query,
> > > >> ask
> > > >> >> > > Cosette to find a counterexample (i.e., an input
relation)
> > where
> > > >> the
> > > >> >> two
> > > >> >> > > implementations will return different results when
executed
> on
> > a
> > > >> given
> > > >> >> > > query. If such a counterexample exists, then Calcite
> developers
> > > can
> > > >> >> > > determine whether this is a "bug" or a "feature".
Does this
> > sound
> > > >> >> similar
> > > >> >> > > to what you have in mind?
> > > >> >> > >
> > > >> >> > > > There might be applications in materialized
views. A query
> Q
> > > can
> > > >> >> use a
> > > >> >> > > materialized view V if V covers Q. In other words
if Q ==
> R(V)
> > > >> where
> > > >> >> R is
> > > >> >> > > some sequence of relational operators. Given Q
and V, Cosette
> > > could
> > > >> >> > perhaps
> > > >> >> > > analyze and either return R (success) or return
that V does
> not
> > > >> cover
> > > >> >> Q
> > > >> >> > > (failure).
> > > >> >> > >
> > > >> >> > > This resembles the problem of deciding whether
a given
> relation
> > > >> >> > (expressed
> > > >> >> > > as a query) is contained in another one. It will
take some
> work
> > > for
> > > >> >> > Cosette
> > > >> >> > > to be able to handle this but it definitely sounds
> interesting.
> > > Do
> > > >> you
> > > >> >> > have
> > > >> >> > > an application in mind? One of them might be to
determine
> > whether
> > > >> >> > > previously cached results can be used.
> > > >> >> > >
> > > >> >> > > We definitely see lots of synergies between the
two tools. To
> > > start
> > > >> >> with
> > > >> >> > > something easy :) , I propose we first discuss
how to use the
> > > >> current
> > > >> >> > > Cosette implementation to audit existing Calcite
rules, and a
> > way
> > > >> to
> > > >> >> > > integrate Cosette into development of future Calcite
rules as
> > > part
> > > >> of
> > > >> >> > code
> > > >> >> > > review / regression tests. What do you think?
> > > >> >> > >
> > > >> >> > > Thanks,
> > > >> >> > > Alvin (on behalf of the Cosette team)
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > shumochu.com
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > shumochu.com
> > > >
> > >
> > >
> > >
> > > --
> > > shumochu.com
> > >
> >
>
>
>
> --
> shumochu.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message