Haisheng,
I am sorry, I have a couple of problems with HerdDB.
1) JOIN order unsorted columns in presence of a WHERE over other columns
This is my case:
CREATE TABLE tblspace1.table1 (k1 string primary key,n1 int,s1 string)
CREATE TABLE tblspace1.table3 (k1 string primary key,n3 int,s3 string)
SELECT t1.k1 as first, t2.k1 as second
FROM tblspace1.table1 t1
INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1
WHERE t1.n1 + 1 = t2.n3
In this case for table1 and table3 no column is physically sorted (no
column with a collation)
I have this Planner error:
java.lang.AssertionError: cannot merge join: left input is not sorted on
left keys
at
org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457)
at
org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153)
at GeneratedMetadataHandler_Collation.collations_$(Unknown Source)
at GeneratedMetadataHandler_Collation.collations(Unknown Source)
at
org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539)
at
org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273)
at
org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122)
at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242)
at
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121)
at
org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111)
at
org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172)
at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230)
at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219)
at
org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620)
at
org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59)
at
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519)
at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535)
at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292)
*If I remove the "WHERE" clause then no error is reported.*
we have many other test cases about JOINs and this one is the only one
that fails
This is the failing test case on HerdDB
https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522
We are using the default set of rules Programs.ofRules(Programs.RULE_SET)
I will try to create a reproducer in Calcite core test suite, in order to
understand if it is a bug in HerdDB or in Calcite
but I am reporting the problem as early as possible.
We wanted to create a daily job that tests HerdDB against current Calcite
master but unfortunately we still have not find the time to do it.
2) Changed the data type of sum(N) from BIGINT to INTEGER
I also noted that sometimes the type of sum(N) where N is an INTEGER column
sometimes it is now reported by Calcite as INTEGER and sometimes as
a BIGINT. In 1.22 every time is reported as BIGINT.
So we have another test failing.
SELECT sum(n1), count(*) as cc, k1
FROM tblspace1.tsql
GROUP by k1
ORDER BY sum(n1)
Here sum(n1) is reported now a INTEGER, previously it was a BIGINT. I would
prefer to see it as a BIGINT in order to prevent overflows
Here are the plans:
INFO: Query: SELECT sum(n1), count(*) as cc, k1 FROM tblspace1.tsql GROUP
by k1 ORDER BY sum(n1) -- Logical Plan
LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
{10.525000095367432 rows, 37.0 cpu, 0.0 io}, id = 1038
LogicalProject(EXPR$0=[$1], CC=[$2], K1=[$0]): rowcount = 2.0, cumulative
cost = {8.525000095367432 rows, 13.0 cpu, 0.0 io}, id = 1037
LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)], CC=[COUNT()]): rowcount
= 2.0, cumulative cost = {6.525000095367432 rows, 7.0 cpu, 0.0 io}, id =
1035
LogicalProject(K1=[$0], n1=[$1]): rowcount = 2.0, cumulative cost =
{4.0 rows, 7.0 cpu, 0.0 io}, id = 1034
LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0,
cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1032
May 12, 2020 11:07:37 AM herddb.sql.CalcitePlanner runPlanner
INFO: Query: SELECT sum(n1), count(*) as cc, k1 FROM tblspace1.tsql GROUP
by k1 ORDER BY sum(n1) -- Best Plan
EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
{5.0 rows, 31.0 cpu, 0.0 io}, id = 1245
EnumerableProject(EXPR$0=[$1], CC=[1:BIGINT], K1=[$0]): rowcount = 2.0,
cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io}, id = 1244
EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0
cpu, 0.0 io}, id = 1243
BindableTableScan(table=[[tblspace1, tsql]], projects=[[0, 1]]):
rowcount = 2.0, cumulative cost = {0.016 rows, 0.024 cpu, 0.0 io}, id = 1055
Within the same test case with the same tables the result of this query is
not changed
SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM tblspace1.tsql
INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
tblspace1.tsql -- Logical Plan
LogicalAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]):
rowcount = 1.0, cumulative cost = {5.387500047683716 rows, 5.0 cpu, 0.0
io}, id = 1253
LogicalProject(n1=[$1]): rowcount = 2.0, cumulative cost = {4.0 rows, 5.0
cpu, 0.0 io}, id = 1252
LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0, cumulative
cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1250
May 12, 2020 11:08:48 AM herddb.sql.CalcitePlanner runPlanner
INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
tblspace1.tsql -- Best Plan
EnumerableAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]):
rowcount = 1.0, cumulative cost = {2.387500047683716 rows, 1.0 cpu, 0.0
io}, id = 1295
EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0
cpu, 0.0 io}, id = 1294
BindableTableScan(table=[[tblspace1, tsql]], projects=[[1]]): rowcount
= 2.0, cumulative cost = {0.012 rows, 0.018000000000000002 cpu, 0.0 io}, id
= 1265
This is the test on HerdDB
https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/sql/SimplerPlannerTest.java#L237
I hope that helps
Enrico
Il giorno mar 12 mag 2020 alle ore 07:59 Haisheng Yuan <hyuan@apache.org>
ha scritto:
> Hi all,
>
> I have created a build for Apache Calcite 1.23.0, release
> candidate 0.
>
> Thanks to everyone who has contributed to this release.
>
> You can read the release notes here:
>
> https://github.com/apache/calcite/blob/calcite-1.23.0-rc0/site/_docs/history.md
>
> The commit to be voted upon:
>
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=edc37c0a21344a48b15877788e082c8acdc7b030
>
> Its hash is edc37c0a21344a48b15877788e082c8acdc7b030
>
> Tag:
> https://github.com/apache/calcite/tree/calcite-1.23.0-rc0
>
> The artifacts to be voted on are located here:
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.23.0-rc0
> (revision 39385)
>
> The hashes of the artifacts are as follows:
>
> 7482b0bb76e672a15bbe846f2dbdc125bd0f3d8a32abf0ea9159b5db0ab2a2d1182e19b408098ecd68d7cc9ff5d7812ea0b33e4aeac818d191b695d437fa1a94
> *apache-calcite-1.23.0-src.tar.gz
>
> A staged Maven repository is available for review at:
>
> https://repository.apache.org/content/repositories/orgapachecalcite-1088/org/apache/calcite/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/hyuan.asc
> https://dist.apache.org/repos/dist/release/calcite/KEYS
>
> N.B.
> To create the jars and test Apache Calcite: "./gradlew build".
>
> If you do not have a Java environment available, you can run the tests
> using docker. To do so, install docker and docker-compose, then run
> "docker-compose run test" from the root of the directory.
>
> Please vote on releasing this package as Apache Calcite 1.23.0.
>
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Calcite 1.23.0
> [ ] 0 I don't feel strongly about it, but I'm okay with the release
> [ ] -1 Do not release this package because...
>
>
> Here is my vote:
>
> +1 (binding)
>
> Thanks,
> Haisheng Yuan
>
|