calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Haisheng Yuan <hy...@apache.org>
Subject Re: Problems on HerdDB with 1.23 was [VOTE] Release apache-calcite-1.23.0 (release candidate 0)
Date Tue, 12 May 2020 14:38:42 GMT
Hi Enrico,

Thanks for reporting issues so quick for calcite-1.23.0-rc0. Appreciate it.
Can you log JIRA for these issues? We will fix them.

Regarding with the first issue, I guess several factors are contributing to the issue.
1. Trait enforcement is enabled for EnumerableConvention by default in 1.23.0, now it can
generate mergejoins. We can disable it again if people would like.

2. RelBuilder hasn't been able to handle physical operator's trait well yet, especially for
Project.

3. Logical operator has been doing some work that it is not expected to do, but physical operator
should do. Here when creating LogicalProject, it is trying to deduce its collation from input
MergeJoin. Project is a frequently created operator, but profiler shows that RelTraitSet.replaceIfs()
take 65% in the total runtime of LogicalProject.create(). That is not only inappropriate operation,
but also time-wasting operation.

4. Transformation rules can match with physical operator. In this case, JoinPushExpressionsRule
matched with EnumerableMergeJoin, but the rule can't deal with physical operator well, because
the traits is not properly handled. This not only happens on JoinPushExpressionsRule, if you
tweak the query, you might be able to see similar assertion error when applying rule FilterIntoJoinRule.
The problem has been there since their inception, but it is just disclosed today by HerdDB,
does that mean no one use Calcite's default rule implementation to match trait aware physical
operators, intentionally? Can we safely stop matching physical operators in these rules? (ProjectMerge
can be an exception, some people use it on physical operator for post processing).

Thanks,
Haisheng


On 2020/05/12 09:10:31, Enrico Olivelli <eolivelli@gmail.com> wrote: 
> Haisheng,
> I am sorry, I have a couple of problems with HerdDB.
> 
> 1) JOIN order unsorted columns in presence of a WHERE over other columns
> This is my case:
> 
> CREATE TABLE tblspace1.table1 (k1 string primary key,n1 int,s1 string)
> CREATE TABLE tblspace1.table3 (k1 string primary key,n3 int,s3 string)
> SELECT t1.k1 as first, t2.k1 as second
> FROM            tblspace1.table1 t1
>  INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1
>  WHERE t1.n1 + 1 = t2.n3
> 
> In this case for table1 and table3 no column is physically sorted (no
> column with a collation)
> 
> I have this Planner error:
> java.lang.AssertionError: cannot merge join: left input is not sorted on
> left keys
> at
> org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457)
> at
> org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153)
> at GeneratedMetadataHandler_Collation.collations_$(Unknown Source)
> at GeneratedMetadataHandler_Collation.collations(Unknown Source)
> at
> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539)
> at
> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273)
> at
> org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122)
> at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242)
> at
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121)
> at
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111)
> at
> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172)
> at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230)
> at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219)
> at
> org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620)
> at
> org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59)
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221)
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519)
> at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535)
> at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292)
> 
> *If I remove the "WHERE" clause then no error is reported.*
> we have many  other test cases about JOINs and this one is the only one
> that fails
> 
> This is the failing test case on HerdDB
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522
> 
> We are using the default set of rules Programs.ofRules(Programs.RULE_SET)
> 
> I will try to create a reproducer in Calcite core test suite, in order to
> understand if it is a bug in HerdDB or in Calcite
> but I am reporting the problem as early as possible.
> We wanted to create a daily job that tests HerdDB against current Calcite
> master but unfortunately we still have not find the time to do it.
> 
> 2) Changed the data type of sum(N) from BIGINT to INTEGER
> 
> I also noted that sometimes the type of sum(N) where N is an INTEGER column
> sometimes it is now reported by Calcite as INTEGER and sometimes as
> a BIGINT. In 1.22 every time is reported as BIGINT.
> So we have another test failing.
> 
> SELECT sum(n1), count(*) as cc, k1
> FROM tblspace1.tsql
> GROUP by k1
> ORDER BY sum(n1)
> 
> Here sum(n1) is reported now a INTEGER, previously it was a BIGINT. I would
> prefer to see it as a BIGINT in order to prevent overflows
> 
> Here are the plans:
> INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql GROUP
> by k1 ORDER BY sum(n1) -- Logical Plan
> LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
> {10.525000095367432 rows, 37.0 cpu, 0.0 io}, id = 1038
>   LogicalProject(EXPR$0=[$1], CC=[$2], K1=[$0]): rowcount = 2.0, cumulative
> cost = {8.525000095367432 rows, 13.0 cpu, 0.0 io}, id = 1037
>     LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)], CC=[COUNT()]): rowcount
> = 2.0, cumulative cost = {6.525000095367432 rows, 7.0 cpu, 0.0 io}, id =
> 1035
>       LogicalProject(K1=[$0], n1=[$1]): rowcount = 2.0, cumulative cost =
> {4.0 rows, 7.0 cpu, 0.0 io}, id = 1034
>         LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0,
> cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1032
> 
> May 12, 2020 11:07:37 AM herddb.sql.CalcitePlanner runPlanner
> INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql GROUP
> by k1 ORDER BY sum(n1) -- Best  Plan
> EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
> {5.0 rows, 31.0 cpu, 0.0 io}, id = 1245
>   EnumerableProject(EXPR$0=[$1], CC=[1:BIGINT], K1=[$0]): rowcount = 2.0,
> cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io}, id = 1244
>     EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0
> cpu, 0.0 io}, id = 1243
>       BindableTableScan(table=[[tblspace1, tsql]], projects=[[0, 1]]):
> rowcount = 2.0, cumulative cost = {0.016 rows, 0.024 cpu, 0.0 io}, id = 1055
> 
> 
> Within the same test case with the same tables the result of this query is
> not changed
> SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM tblspace1.tsql
> INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
> tblspace1.tsql -- Logical Plan
> LogicalAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]):
> rowcount = 1.0, cumulative cost = {5.387500047683716 rows, 5.0 cpu, 0.0
> io}, id = 1253
>   LogicalProject(n1=[$1]): rowcount = 2.0, cumulative cost = {4.0 rows, 5.0
> cpu, 0.0 io}, id = 1252
>     LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0, cumulative
> cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1250
> 
> May 12, 2020 11:08:48 AM herddb.sql.CalcitePlanner runPlanner
> INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
> tblspace1.tsql -- Best  Plan
> EnumerableAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]):
> rowcount = 1.0, cumulative cost = {2.387500047683716 rows, 1.0 cpu, 0.0
> io}, id = 1295
>   EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0
> cpu, 0.0 io}, id = 1294
>     BindableTableScan(table=[[tblspace1, tsql]], projects=[[1]]): rowcount
> = 2.0, cumulative cost = {0.012 rows, 0.018000000000000002 cpu, 0.0 io}, id
> = 1265
> 
> This is the test on HerdDB
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/sql/SimplerPlannerTest.java#L237
> 
> I hope that helps
> Enrico
> 
> 
> Il giorno mar 12 mag 2020 alle ore 07:59 Haisheng Yuan <hyuan@apache.org>
> ha scritto:
> 
> > Hi all,
> >
> > I have created a build for Apache Calcite 1.23.0, release
> > candidate 0.
> >
> > Thanks to everyone who has contributed to this release.
> >
> > You can read the release notes here:
> >
> > https://github.com/apache/calcite/blob/calcite-1.23.0-rc0/site/_docs/history.md
> >
> > The commit to be voted upon:
> >
> > https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=edc37c0a21344a48b15877788e082c8acdc7b030
> >
> > Its hash is edc37c0a21344a48b15877788e082c8acdc7b030
> >
> > Tag:
> > https://github.com/apache/calcite/tree/calcite-1.23.0-rc0
> >
> > The artifacts to be voted on are located here:
> > https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.23.0-rc0
> > (revision 39385)
> >
> > The hashes of the artifacts are as follows:
> >
> > 7482b0bb76e672a15bbe846f2dbdc125bd0f3d8a32abf0ea9159b5db0ab2a2d1182e19b408098ecd68d7cc9ff5d7812ea0b33e4aeac818d191b695d437fa1a94
> > *apache-calcite-1.23.0-src.tar.gz
> >
> > A staged Maven repository is available for review at:
> >
> > https://repository.apache.org/content/repositories/orgapachecalcite-1088/org/apache/calcite/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/hyuan.asc
> > https://dist.apache.org/repos/dist/release/calcite/KEYS
> >
> > N.B.
> > To create the jars and test Apache Calcite: "./gradlew build".
> >
> > If you do not have a Java environment available, you can run the tests
> > using docker. To do so, install docker and docker-compose, then run
> > "docker-compose run test" from the root of the directory.
> >
> > Please vote on releasing this package as Apache Calcite 1.23.0.
> >
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Calcite 1.23.0
> > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > [ ] -1 Do not release this package because...
> >
> >
> > Here is my vote:
> >
> > +1 (binding)
> >
> > Thanks,
> > Haisheng Yuan
> >
> 

Mime
View raw message