calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Problems on HerdDB with 1.23 was [VOTE] Release apache-calcite-1.23.0 (release candidate 0)
Date Wed, 13 May 2020 08:15:30 GMT
Haisheng,




Il Mar 12 Mag 2020, 16:38 Haisheng Yuan <hyuan@apache.org> ha scritto:

> Hi Enrico,
>
> Thanks for reporting issues so quick for calcite-1.23.0-rc0. Appreciate it.
> Can you log JIRA for these issues? We will fix them.
>
Doing it non


> Regarding with the first issue, I guess several factors are contributing
> to the issue.
> 1. Trait enforcement is enabled for EnumerableConvention by default in
> 1.23.0, now it can generate mergejoins. We can disable it again if people
> would like.
>

Is it possibile to disable it? I will check. Any suggestion is welcome


> 2. RelBuilder hasn't been able to handle physical operator's trait well
> yet, especially for Project.
>
> 3. Logical operator has been doing some work that it is not expected to
> do, but physical operator should do. Here when creating LogicalProject, it
> is trying to deduce its collation from input MergeJoin. Project is a
> frequently created operator, but profiler shows that
> RelTraitSet.replaceIfs() take 65% in the total runtime of
> LogicalProject.create(). That is not only inappropriate operation, but also
> time-wasting operation.
>
> 4. Transformation rules can match with physical operator. In this case,
> JoinPushExpressionsRule matched with EnumerableMergeJoin, but the rule
> can't deal with physical operator well, because the traits is not properly
> handled. This not only happens on JoinPushExpressionsRule, if you tweak the
> query, you might be able to see similar assertion error when applying rule
> FilterIntoJoinRule. The problem has been there since their inception, but
> it is just disclosed today by HerdDB, does that mean no one use Calcite's
> default rule implementation to match trait aware physical operators,
> intentionally? Can we safely stop matching physical operators in these
> rules? (ProjectMerge can be an exception, some people use it on physical
> operator for post processing).
>

Do you think that I can safely disable those rules?

Enrico


> Thanks,
> Haisheng
>
>
> On 2020/05/12 09:10:31, Enrico Olivelli <eolivelli@gmail.com> wrote:
> > Haisheng,
> > I am sorry, I have a couple of problems with HerdDB.
> >
> > 1) JOIN order unsorted columns in presence of a WHERE over other columns
> > This is my case:
> >
> > CREATE TABLE tblspace1.table1 (k1 string primary key,n1 int,s1 string)
> > CREATE TABLE tblspace1.table3 (k1 string primary key,n3 int,s3 string)
> > SELECT t1.k1 as first, t2.k1 as second
> > FROM            tblspace1.table1 t1
> >  INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1
> >  WHERE t1.n1 + 1 = t2.n3
> >
> > In this case for table1 and table3 no column is physically sorted (no
> > column with a collation)
> >
> > I have this Planner error:
> > java.lang.AssertionError: cannot merge join: left input is not sorted on
> > left keys
> > at
> >
> org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457)
> > at
> >
> org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153)
> > at GeneratedMetadataHandler_Collation.collations_$(Unknown Source)
> > at GeneratedMetadataHandler_Collation.collations(Unknown Source)
> > at
> >
> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539)
> > at
> >
> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273)
> > at
> >
> org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122)
> > at org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242)
> > at
> >
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121)
> > at
> >
> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111)
> > at
> >
> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172)
> > at org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464)
> > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258)
> > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230)
> > at org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219)
> > at
> >
> org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620)
> > at
> >
> org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59)
> > at
> >
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221)
> > at
> >
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519)
> > at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535)
> > at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292)
> >
> > *If I remove the "WHERE" clause then no error is reported.*
> > we have many  other test cases about JOINs and this one is the only one
> > that fails
> >
> > This is the failing test case on HerdDB
> >
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522
> >
> > We are using the default set of rules Programs.ofRules(Programs.RULE_SET)
> >
> > I will try to create a reproducer in Calcite core test suite, in order to
> > understand if it is a bug in HerdDB or in Calcite
> > but I am reporting the problem as early as possible.
> > We wanted to create a daily job that tests HerdDB against current Calcite
> > master but unfortunately we still have not find the time to do it.
> >
> > 2) Changed the data type of sum(N) from BIGINT to INTEGER
> >
> > I also noted that sometimes the type of sum(N) where N is an INTEGER
> column
> > sometimes it is now reported by Calcite as INTEGER and sometimes as
> > a BIGINT. In 1.22 every time is reported as BIGINT.
> > So we have another test failing.
> >
> > SELECT sum(n1), count(*) as cc, k1
> > FROM tblspace1.tsql
> > GROUP by k1
> > ORDER BY sum(n1)
> >
> > Here sum(n1) is reported now a INTEGER, previously it was a BIGINT. I
> would
> > prefer to see it as a BIGINT in order to prevent overflows
> >
> > Here are the plans:
> > INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql
> GROUP
> > by k1 ORDER BY sum(n1) -- Logical Plan
> > LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
> > {10.525000095367432 rows, 37.0 cpu, 0.0 io}, id = 1038
> >   LogicalProject(EXPR$0=[$1], CC=[$2], K1=[$0]): rowcount = 2.0,
> cumulative
> > cost = {8.525000095367432 rows, 13.0 cpu, 0.0 io}, id = 1037
> >     LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)], CC=[COUNT()]):
> rowcount
> > = 2.0, cumulative cost = {6.525000095367432 rows, 7.0 cpu, 0.0 io}, id =
> > 1035
> >       LogicalProject(K1=[$0], n1=[$1]): rowcount = 2.0, cumulative cost =
> > {4.0 rows, 7.0 cpu, 0.0 io}, id = 1034
> >         LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0,
> > cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1032
> >
> > May 12, 2020 11:07:37 AM herddb.sql.CalcitePlanner runPlanner
> > INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM tblspace1.tsql
> GROUP
> > by k1 ORDER BY sum(n1) -- Best  Plan
> > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative cost =
> > {5.0 rows, 31.0 cpu, 0.0 io}, id = 1245
> >   EnumerableProject(EXPR$0=[$1], CC=[1:BIGINT], K1=[$0]): rowcount = 2.0,
> > cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io}, id = 1244
> >     EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows,
> 1.0
> > cpu, 0.0 io}, id = 1243
> >       BindableTableScan(table=[[tblspace1, tsql]], projects=[[0, 1]]):
> > rowcount = 2.0, cumulative cost = {0.016 rows, 0.024 cpu, 0.0 io}, id =
> 1055
> >
> >
> > Within the same test case with the same tables the result of this query
> is
> > not changed
> > SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM tblspace1.tsql
> > INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
> > tblspace1.tsql -- Logical Plan
> > LogicalAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)], MA=[MAX($0)]):
> > rowcount = 1.0, cumulative cost = {5.387500047683716 rows, 5.0 cpu, 0.0
> > io}, id = 1253
> >   LogicalProject(n1=[$1]): rowcount = 2.0, cumulative cost = {4.0 rows,
> 5.0
> > cpu, 0.0 io}, id = 1252
> >     LogicalTableScan(table=[[tblspace1, tsql]]): rowcount = 2.0,
> cumulative
> > cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1250
> >
> > May 12, 2020 11:08:48 AM herddb.sql.CalcitePlanner runPlanner
> > INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
> > tblspace1.tsql -- Best  Plan
> > EnumerableAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)],
> MA=[MAX($0)]):
> > rowcount = 1.0, cumulative cost = {2.387500047683716 rows, 1.0 cpu, 0.0
> > io}, id = 1295
> >   EnumerableInterpreter: rowcount = 2.0, cumulative cost = {1.0 rows, 1.0
> > cpu, 0.0 io}, id = 1294
> >     BindableTableScan(table=[[tblspace1, tsql]], projects=[[1]]):
> rowcount
> > = 2.0, cumulative cost = {0.012 rows, 0.018000000000000002 cpu, 0.0 io},
> id
> > = 1265
> >
> > This is the test on HerdDB
> >
> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/sql/SimplerPlannerTest.java#L237
> >
> > I hope that helps
> > Enrico
> >
> >
> > Il giorno mar 12 mag 2020 alle ore 07:59 Haisheng Yuan <hyuan@apache.org
> >
> > ha scritto:
> >
> > > Hi all,
> > >
> > > I have created a build for Apache Calcite 1.23.0, release
> > > candidate 0.
> > >
> > > Thanks to everyone who has contributed to this release.
> > >
> > > You can read the release notes here:
> > >
> > >
> https://github.com/apache/calcite/blob/calcite-1.23.0-rc0/site/_docs/history.md
> > >
> > > The commit to be voted upon:
> > >
> > >
> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=edc37c0a21344a48b15877788e082c8acdc7b030
> > >
> > > Its hash is edc37c0a21344a48b15877788e082c8acdc7b030
> > >
> > > Tag:
> > > https://github.com/apache/calcite/tree/calcite-1.23.0-rc0
> > >
> > > The artifacts to be voted on are located here:
> > >
> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.23.0-rc0
> > > (revision 39385)
> > >
> > > The hashes of the artifacts are as follows:
> > >
> > >
> 7482b0bb76e672a15bbe846f2dbdc125bd0f3d8a32abf0ea9159b5db0ab2a2d1182e19b408098ecd68d7cc9ff5d7812ea0b33e4aeac818d191b695d437fa1a94
> > > *apache-calcite-1.23.0-src.tar.gz
> > >
> > > A staged Maven repository is available for review at:
> > >
> > >
> https://repository.apache.org/content/repositories/orgapachecalcite-1088/org/apache/calcite/
> > >
> > > Release artifacts are signed with the following key:
> > > https://people.apache.org/keys/committer/hyuan.asc
> > > https://dist.apache.org/repos/dist/release/calcite/KEYS
> > >
> > > N.B.
> > > To create the jars and test Apache Calcite: "./gradlew build".
> > >
> > > If you do not have a Java environment available, you can run the tests
> > > using docker. To do so, install docker and docker-compose, then run
> > > "docker-compose run test" from the root of the directory.
> > >
> > > Please vote on releasing this package as Apache Calcite 1.23.0.
> > >
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Calcite 1.23.0
> > > [ ]  0 I don't feel strongly about it, but I'm okay with the release
> > > [ ] -1 Do not release this package because...
> > >
> > >
> > > Here is my vote:
> > >
> > > +1 (binding)
> > >
> > > Thanks,
> > > Haisheng Yuan
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message