calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enrico Olivelli <eolive...@gmail.com>
Subject Re: Problems on HerdDB with 1.23 was [VOTE] Release apache-calcite-1.23.0 (release candidate 0)
Date Fri, 15 May 2020 08:02:39 GMT
All of the two tickets have been fixed on current master!
The former was a regression
The latter was an improvement in Calcite that needed only a fix in a test
in HerdDB suite
check the JIRA for more details

We are re running all of the tests locally, of HerdDB and of some known
downstream applications

Thank you !
Enrico

Il giorno mer 13 mag 2020 alle ore 15:05 Enrico Olivelli <
eolivelli@gmail.com> ha scritto:

> Tickets:
> https://issues.apache.org/jira/browse/CALCITE-3997
> https://issues.apache.org/jira/browse/CALCITE-3998
>
> I will try to create the reproducer, but maybe you will be smarter than me
> :-)
>
>
> Enrico
>
> Il giorno mer 13 mag 2020 alle ore 14:44 Haisheng Yuan <hyuan@apache.org>
> ha scritto:
>
>> > Yesterday I was trying to create a test case in Calcite codebase.
>> > But I did not find where to put it.
>> > Can you please give me an hint?
>> Maybe JdbcTest.java, take testMergeJoin() as an example.
>>
>> > Otherwise I will try to create a minimal Java block of code that
>> reproduces
>> > the problem. I did that way last time and Stamatis was able to create
>> the
>> > test in Calcite code
>> >
>> > Does this approach work for you?
>> That would also work.
>>
>> Thanks,
>> Haisheng
>> On 2020/05/13 12:31:26, Enrico Olivelli <eolivelli@gmail.com> wrote:
>> > Il Mer 13 Mag 2020, 13:45 Haisheng Yuan <hyuan@apache.org> ha scritto:
>> >
>> > > Hi Enrico,
>> > >
>> > > > Is it possibile to disable it? I will check. Any suggestion is
>> welcome
>> > > Disabling it won't help. It is a Calcite bug. There is nothing wrong
>> in
>> > > HerdDB. Can you help us log a JIRA and provide a reproducible test
>> case?
>> > >
>> >
>> > Sorry for the delay.
>> > I has another problem today. I will do as soon as possible.
>> >
>> > Yesterday I was trying to create a test case in Calcite codebase.
>> > But I did not find where to put it.
>> > Can you please give me an hint?
>> > Otherwise I will try to create a minimal Java block of code that
>> reproduces
>> > the problem. I did that way last time and Stamatis was able to create
>> the
>> > test in Calcite code
>> >
>> > Does this approach work for you?
>> >
>> > Enrico
>> >
>> >
>> > > > Do you think that I can safely disable those rules?
>> > > You have to create your own rule instances. But let Calcite do it for
>> you.
>> > >
>> > > Thanks,
>> > > Haisheng Yuan
>> > >
>> > > On 2020/05/13 08:15:30, Enrico Olivelli <eolivelli@gmail.com> wrote:
>> > > > Haisheng,
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > Il Mar 12 Mag 2020, 16:38 Haisheng Yuan <hyuan@apache.org> ha
>> scritto:
>> > > >
>> > > > > Hi Enrico,
>> > > > >
>> > > > > Thanks for reporting issues so quick for calcite-1.23.0-rc0.
>> > > Appreciate it.
>> > > > > Can you log JIRA for these issues? We will fix them.
>> > > > >
>> > > > Doing it non
>> > > >
>> > > >
>> > > > > Regarding with the first issue, I guess several factors are
>> > > contributing
>> > > > > to the issue.
>> > > > > 1. Trait enforcement is enabled for EnumerableConvention by
>> default in
>> > > > > 1.23.0, now it can generate mergejoins. We can disable it again
if
>> > > people
>> > > > > would like.
>> > > > >
>> > > >
>> > > > Is it possibile to disable it? I will check. Any suggestion is
>> welcome
>> > > >
>> > > >
>> > > > > 2. RelBuilder hasn't been able to handle physical operator's
>> trait well
>> > > > > yet, especially for Project.
>> > > > >
>> > > > > 3. Logical operator has been doing some work that it is not
>> expected to
>> > > > > do, but physical operator should do. Here when creating
>> > > LogicalProject, it
>> > > > > is trying to deduce its collation from input MergeJoin. Project
>> is a
>> > > > > frequently created operator, but profiler shows that
>> > > > > RelTraitSet.replaceIfs() take 65% in the total runtime of
>> > > > > LogicalProject.create(). That is not only inappropriate
>> operation, but
>> > > also
>> > > > > time-wasting operation.
>> > > > >
>> > > > > 4. Transformation rules can match with physical operator. In
this
>> case,
>> > > > > JoinPushExpressionsRule matched with EnumerableMergeJoin, but
the
>> rule
>> > > > > can't deal with physical operator well, because the traits is
not
>> > > properly
>> > > > > handled. This not only happens on JoinPushExpressionsRule, if
you
>> > > tweak the
>> > > > > query, you might be able to see similar assertion error when
>> applying
>> > > rule
>> > > > > FilterIntoJoinRule. The problem has been there since their
>> inception,
>> > > but
>> > > > > it is just disclosed today by HerdDB, does that mean no one use
>> > > Calcite's
>> > > > > default rule implementation to match trait aware physical
>> operators,
>> > > > > intentionally? Can we safely stop matching physical operators
in
>> these
>> > > > > rules? (ProjectMerge can be an exception, some people use it
on
>> > > physical
>> > > > > operator for post processing).
>> > > > >
>> > > >
>> > > > Do you think that I can safely disable those rules?
>> > > >
>> > > > Enrico
>> > > >
>> > > >
>> > > > > Thanks,
>> > > > > Haisheng
>> > > > >
>> > > > >
>> > > > > On 2020/05/12 09:10:31, Enrico Olivelli <eolivelli@gmail.com>
>> wrote:
>> > > > > > Haisheng,
>> > > > > > I am sorry, I have a couple of problems with HerdDB.
>> > > > > >
>> > > > > > 1) JOIN order unsorted columns in presence of a WHERE over
other
>> > > columns
>> > > > > > This is my case:
>> > > > > >
>> > > > > > CREATE TABLE tblspace1.table1 (k1 string primary key,n1
int,s1
>> > > string)
>> > > > > > CREATE TABLE tblspace1.table3 (k1 string primary key,n3
int,s3
>> > > string)
>> > > > > > SELECT t1.k1 as first, t2.k1 as second
>> > > > > > FROM            tblspace1.table1 t1
>> > > > > >  INNER JOIN tblspace1.table3 t2 ON t1.k1=t2.k1
>> > > > > >  WHERE t1.n1 + 1 = t2.n3
>> > > > > >
>> > > > > > In this case for table1 and table3 no column is physically
>> sorted (no
>> > > > > > column with a collation)
>> > > > > >
>> > > > > > I have this Planner error:
>> > > > > > java.lang.AssertionError: cannot merge join: left input
is not
>> > > sorted on
>> > > > > > left keys
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.metadata.RelMdCollation.mergeJoin(RelMdCollation.java:457)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.metadata.RelMdCollation.collations(RelMdCollation.java:153)
>> > > > > > at GeneratedMetadataHandler_Collation.collations_$(Unknown
>> Source)
>> > > > > > at GeneratedMetadataHandler_Collation.collations(Unknown
Source)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.metadata.RelMetadataQuery.collations(RelMetadataQuery.java:539)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.metadata.RelMdCollation.project(RelMdCollation.java:273)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.logical.LogicalProject.lambda$create$0(LogicalProject.java:122)
>> > > > > > at
>> > > org.apache.calcite.plan.RelTraitSet.replaceIfs(RelTraitSet.java:242)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:121)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.logical.LogicalProject.create(LogicalProject.java:111)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.core.RelFactories$ProjectFactoryImpl.createProject(RelFactories.java:172)
>> > > > > > at
>> org.apache.calcite.tools.RelBuilder.project_(RelBuilder.java:1464)
>> > > > > > at
>> org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1258)
>> > > > > > at
>> org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1230)
>> > > > > > at
>> org.apache.calcite.tools.RelBuilder.project(RelBuilder.java:1219)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.plan.RelOptUtil.pushDownJoinConditions(RelOptUtil.java:3620)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.rel.rules.JoinPushExpressionsRule.onMatch(JoinPushExpressionsRule.java:59)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:221)
>> > > > > > at
>> > > > > >
>> > > > >
>> > >
>> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:519)
>> > > > > > at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:535)
>> > > > > > at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:292)
>> > > > > >
>> > > > > > *If I remove the "WHERE" clause then no error is reported.*
>> > > > > > we have many  other test cases about JOINs and this one
is the
>> only
>> > > one
>> > > > > > that fails
>> > > > > >
>> > > > > > This is the failing test case on HerdDB
>> > > > > >
>> > > > >
>> > >
>> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/core/SimpleJoinTest.java#L522
>> > > > > >
>> > > > > > We are using the default set of rules
>> > > Programs.ofRules(Programs.RULE_SET)
>> > > > > >
>> > > > > > I will try to create a reproducer in Calcite core test suite,
in
>> > > order to
>> > > > > > understand if it is a bug in HerdDB or in Calcite
>> > > > > > but I am reporting the problem as early as possible.
>> > > > > > We wanted to create a daily job that tests HerdDB against
>> current
>> > > Calcite
>> > > > > > master but unfortunately we still have not find the time
to do
>> it.
>> > > > > >
>> > > > > > 2) Changed the data type of sum(N) from BIGINT to INTEGER
>> > > > > >
>> > > > > > I also noted that sometimes the type of sum(N) where N is
an
>> INTEGER
>> > > > > column
>> > > > > > sometimes it is now reported by Calcite as INTEGER and
>> sometimes as
>> > > > > > a BIGINT. In 1.22 every time is reported as BIGINT.
>> > > > > > So we have another test failing.
>> > > > > >
>> > > > > > SELECT sum(n1), count(*) as cc, k1
>> > > > > > FROM tblspace1.tsql
>> > > > > > GROUP by k1
>> > > > > > ORDER BY sum(n1)
>> > > > > >
>> > > > > > Here sum(n1) is reported now a INTEGER, previously it was
a
>> BIGINT. I
>> > > > > would
>> > > > > > prefer to see it as a BIGINT in order to prevent overflows
>> > > > > >
>> > > > > > Here are the plans:
>> > > > > > INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM
>> tblspace1.tsql
>> > > > > GROUP
>> > > > > > by k1 ORDER BY sum(n1) -- Logical Plan
>> > > > > > LogicalSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0, cumulative
>> cost
>> > > =
>> > > > > > {10.525000095367432 rows, 37.0 cpu, 0.0 io}, id = 1038
>> > > > > >   LogicalProject(EXPR$0=[$1], CC=[$2], K1=[$0]): rowcount
= 2.0,
>> > > > > cumulative
>> > > > > > cost = {8.525000095367432 rows, 13.0 cpu, 0.0 io}, id =
1037
>> > > > > >     LogicalAggregate(group=[{0}], EXPR$0=[SUM($1)],
>> CC=[COUNT()]):
>> > > > > rowcount
>> > > > > > = 2.0, cumulative cost = {6.525000095367432 rows, 7.0 cpu,
0.0
>> io},
>> > > id =
>> > > > > > 1035
>> > > > > >       LogicalProject(K1=[$0], n1=[$1]): rowcount = 2.0,
>> cumulative
>> > > cost =
>> > > > > > {4.0 rows, 7.0 cpu, 0.0 io}, id = 1034
>> > > > > >         LogicalTableScan(table=[[tblspace1, tsql]]): rowcount
=
>> 2.0,
>> > > > > > cumulative cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1032
>> > > > > >
>> > > > > > May 12, 2020 11:07:37 AM herddb.sql.CalcitePlanner runPlanner
>> > > > > > INFO: Query: SELECT sum(n1), count(*) as cc, k1  FROM
>> tblspace1.tsql
>> > > > > GROUP
>> > > > > > by k1 ORDER BY sum(n1) -- Best  Plan
>> > > > > > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 2.0,
>> cumulative
>> > > cost =
>> > > > > > {5.0 rows, 31.0 cpu, 0.0 io}, id = 1245
>> > > > > >   EnumerableProject(EXPR$0=[$1], CC=[1:BIGINT], K1=[$0]):
>> rowcount =
>> > > 2.0,
>> > > > > > cumulative cost = {3.0 rows, 7.0 cpu, 0.0 io}, id = 1244
>> > > > > >     EnumerableInterpreter: rowcount = 2.0, cumulative cost
=
>> {1.0
>> > > rows,
>> > > > > 1.0
>> > > > > > cpu, 0.0 io}, id = 1243
>> > > > > >       BindableTableScan(table=[[tblspace1, tsql]], projects=[[0,
>> > > 1]]):
>> > > > > > rowcount = 2.0, cumulative cost = {0.016 rows, 0.024 cpu,
0.0
>> io},
>> > > id =
>> > > > > 1055
>> > > > > >
>> > > > > >
>> > > > > > Within the same test case with the same tables the result
of
>> this
>> > > query
>> > > > > is
>> > > > > > not changed
>> > > > > > SELECT sum(n1) as ss, min(n1) as mi, max(n1) as ma FROM
>> > > tblspace1.tsql
>> > > > > > INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1)
as ma
>> FROM
>> > > > > > tblspace1.tsql -- Logical Plan
>> > > > > > LogicalAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)],
>> > > MA=[MAX($0)]):
>> > > > > > rowcount = 1.0, cumulative cost = {5.387500047683716 rows,
5.0
>> cpu,
>> > > 0.0
>> > > > > > io}, id = 1253
>> > > > > >   LogicalProject(n1=[$1]): rowcount = 2.0, cumulative cost
=
>> {4.0
>> > > rows,
>> > > > > 5.0
>> > > > > > cpu, 0.0 io}, id = 1252
>> > > > > >     LogicalTableScan(table=[[tblspace1, tsql]]): rowcount
= 2.0,
>> > > > > cumulative
>> > > > > > cost = {2.0 rows, 3.0 cpu, 0.0 io}, id = 1250
>> > > > > >
>> > > > > > May 12, 2020 11:08:48 AM herddb.sql.CalcitePlanner runPlanner
>> > > > > > INFO: Query: SELECT sum(n1) as ss, min(n1) as mi, max(n1)
as ma
>> FROM
>> > > > > > tblspace1.tsql -- Best  Plan
>> > > > > > EnumerableAggregate(group=[{}], SS=[SUM($0)], MI=[MIN($0)],
>> > > > > MA=[MAX($0)]):
>> > > > > > rowcount = 1.0, cumulative cost = {2.387500047683716 rows,
1.0
>> cpu,
>> > > 0.0
>> > > > > > io}, id = 1295
>> > > > > >   EnumerableInterpreter: rowcount = 2.0, cumulative cost
= {1.0
>> > > rows, 1.0
>> > > > > > cpu, 0.0 io}, id = 1294
>> > > > > >     BindableTableScan(table=[[tblspace1, tsql]],
>> projects=[[1]]):
>> > > > > rowcount
>> > > > > > = 2.0, cumulative cost = {0.012 rows, 0.018000000000000002
cpu,
>> 0.0
>> > > io},
>> > > > > id
>> > > > > > = 1265
>> > > > > >
>> > > > > > This is the test on HerdDB
>> > > > > >
>> > > > >
>> > >
>> https://github.com/diennea/herddb/blob/vote-calcite-123/herddb-core/src/test/java/herddb/sql/SimplerPlannerTest.java#L237
>> > > > > >
>> > > > > > I hope that helps
>> > > > > > Enrico
>> > > > > >
>> > > > > >
>> > > > > > Il giorno mar 12 mag 2020 alle ore 07:59 Haisheng Yuan <
>> > > hyuan@apache.org
>> > > > > >
>> > > > > > ha scritto:
>> > > > > >
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I have created a build for Apache Calcite 1.23.0, release
>> > > > > > > candidate 0.
>> > > > > > >
>> > > > > > > Thanks to everyone who has contributed to this release.
>> > > > > > >
>> > > > > > > You can read the release notes here:
>> > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https://github.com/apache/calcite/blob/calcite-1.23.0-rc0/site/_docs/history.md
>> > > > > > >
>> > > > > > > The commit to be voted upon:
>> > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https://gitbox.apache.org/repos/asf?p=calcite.git;a=commit;h=edc37c0a21344a48b15877788e082c8acdc7b030
>> > > > > > >
>> > > > > > > Its hash is edc37c0a21344a48b15877788e082c8acdc7b030
>> > > > > > >
>> > > > > > > Tag:
>> > > > > > > https://github.com/apache/calcite/tree/calcite-1.23.0-rc0
>> > > > > > >
>> > > > > > > The artifacts to be voted on are located here:
>> > > > > > >
>> > > > >
>> > >
>> https://dist.apache.org/repos/dist/dev/calcite/apache-calcite-1.23.0-rc0
>> > > > > > > (revision 39385)
>> > > > > > >
>> > > > > > > The hashes of the artifacts are as follows:
>> > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> 7482b0bb76e672a15bbe846f2dbdc125bd0f3d8a32abf0ea9159b5db0ab2a2d1182e19b408098ecd68d7cc9ff5d7812ea0b33e4aeac818d191b695d437fa1a94
>> > > > > > > *apache-calcite-1.23.0-src.tar.gz
>> > > > > > >
>> > > > > > > A staged Maven repository is available for review at:
>> > > > > > >
>> > > > > > >
>> > > > >
>> > >
>> https://repository.apache.org/content/repositories/orgapachecalcite-1088/org/apache/calcite/
>> > > > > > >
>> > > > > > > Release artifacts are signed with the following key:
>> > > > > > > https://people.apache.org/keys/committer/hyuan.asc
>> > > > > > > https://dist.apache.org/repos/dist/release/calcite/KEYS
>> > > > > > >
>> > > > > > > N.B.
>> > > > > > > To create the jars and test Apache Calcite: "./gradlew
build".
>> > > > > > >
>> > > > > > > If you do not have a Java environment available, you
can run
>> the
>> > > tests
>> > > > > > > using docker. To do so, install docker and docker-compose,
>> then run
>> > > > > > > "docker-compose run test" from the root of the directory.
>> > > > > > >
>> > > > > > > Please vote on releasing this package as Apache Calcite
>> 1.23.0.
>> > > > > > >
>> > > > > > > The vote is open for the next 72 hours and passes if
a
>> majority of
>> > > at
>> > > > > > > least three +1 PMC votes are cast.
>> > > > > > >
>> > > > > > > [ ] +1 Release this package as Apache Calcite 1.23.0
>> > > > > > > [ ]  0 I don't feel strongly about it, but I'm okay
with the
>> > > release
>> > > > > > > [ ] -1 Do not release this package because...
>> > > > > > >
>> > > > > > >
>> > > > > > > Here is my vote:
>> > > > > > >
>> > > > > > > +1 (binding)
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > > Haisheng Yuan
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message