calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "pengzhiwei (JIRA)" <>
Subject [jira] [Created] (CALCITE-2630) Convert SqlInOperator to a IN-Function
Date Thu, 18 Oct 2018 04:07:00 GMT
pengzhiwei created CALCITE-2630:

             Summary: Convert SqlInOperator to a IN-Function
                 Key: CALCITE-2630
             Project: Calcite
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.17.0
            Reporter: pengzhiwei
            Assignee: Julian Hyde

Currently Calcite translate "IN" to "OR" expression when the count of  IN's operands less
than "inSubQueryThreshold" and to "Join" when the count greater  than "inSubQueryThreshold"
to get better performance.

  However this translation to "JOIN" is so complex,especially when the "IN" expression located
in the "select" or "join on condition".For example:

select case when deptno in (1,2) then 0 else 1 end from emp
the logical plan generated as follow:


LogicalProject(EXPR$0=[CASE(CAST(CASE(=($9, 0), false, IS NOT NULL($13), true, IS NULL($11),
null, <($10, $9), null, false)):BOOLEAN NOT NULL, 0, 1)])
LogicalJoin(condition=[=($11, $12)], joinType=[left])
LogicalProject(EMPNO=[$0], ENAME=[$1], JOB=[$2], MGR=[$3], HIREDATE=[$4], SAL=[$5], COMM=[$6],
DEPTNO=[$7], SLACKER=[$8], $f0=[$9], $f1=[$10], DEPTNO0=[$7])
LogicalJoin(condition=[true], joinType=[inner])
LogicalTableScan(table=[[CATALOG, SALES, EMP]])
LogicalAggregate(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])
LogicalAggregate(group=[{0}], agg#0=[MIN($1)])
LogicalProject(ROW_VALUE=[$0], $f1=[true])
LogicalValues(tuples=[[{ 1 }, { 2 }]])

The generated logical plan is so complex for such a simple sql!


I think we can treat "IN" as a function like "plus" and "minus".And there is no translation spending
on "IN" just keep it as it is.This would be much clear in the logical plan!

In the compute stage,We can provide a "InExpression":


 We can put all the constant conditions to a "Set".In that way,the computational complexity
can reduce from O(n) to O(1).

It would be much clear and have a good performance.And we have implement it in our streaming-sql


This message was sent by Atlassian JIRA

View raw message