calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiarong Wei (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CALCITE-1428) Inefficient execution plan of SELECT and LIMIT for Druid
Date Tue, 11 Oct 2016 10:08:20 GMT
Jiarong Wei created CALCITE-1428:
------------------------------------

             Summary: Inefficient execution plan of SELECT and LIMIT for Druid
                 Key: CALCITE-1428
                 URL: https://issues.apache.org/jira/browse/CALCITE-1428
             Project: Calcite
          Issue Type: Bug
          Components: core, druid
            Reporter: Jiarong Wei
            Assignee: Julian Hyde


For SQLs like:

1. {{SELECT * FROM <table> LIMIT <row_count>}}
2. {{SELECT <all_columns_specified_explicitly> FROM <table> LIMIT <row_count>}}

{{DruidSortRule}} in Druid adapter does take effect and {{LIMIT}} is pushed into {{DruidQuery}}.
However the corresponding execution plan isn't chosen as the best one. Thus Calcite will retrieve
all data from Druid and purge all unnecessary columns.

These are three SQLs and their corresponding execution plans below for dataset {{wikiticker}}
in Druid quickstart:

1. {{SELECT "cityName" FROM "wikiticker" LIMIT 5}}

{code}
rel#27:EnumerableInterpreter.ENUMERABLE.[](input=rel#26:Subset#2.BINDABLE.[])

rel#85:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z],projects=[$3],fetch=5)
{code}

2. {{SELECT * FROM "wikiticker" LIMIT 5}}

{code}
rel#52:EnumerableLimit.ENUMERABLE.[](input=rel#36:Subset#0.ENUMERABLE.[],fetch=5)

rel#79:EnumerableInterpreter.ENUMERABLE.[](input=rel#4:Subset#0.BINDABLE.[])

rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
{code}

3. {{SELECT "__time", "added", "channel", "cityName", "comment", "commentLength", "count",
"countryIsoCode", "countryName", "deleted", "delta", "deltaBucket", "diffUrl", "flags", "isAnonymous",
"isMinor", "isNew", "isRobot", "isUnpatrolled", "metroCode", "namespace", "page", "regionIsoCode",
"regionName", "user", "user_unique" FROM "wikiticker" LIMIT 5}}

{code}
rel#42:EnumerableLimit.ENUMERABLE.[](input=rel#41:Subset#1.ENUMERABLE.[],fetch=5)

rel#113:EnumerableInterpreter.ENUMERABLE.[](input=rel#34:Subset#1.BINDABLE.[])

rel#52:BindableProject.BINDABLE.[](input=rel#4:Subset#0.BINDABLE.[],__time=$0,added=$1,channel=$2,cityName=$3,comment=$4,commentLength=$5,count=$6,countryIsoCode=$7,countryName=$8,deleted=$9,delta=$10,deltaBucket=$11,diffUrl=$12,flags=$13,isAnonymous=$14,isMinor=$15,isNew=$16,isRobot=$17,isUnpatrolled=$18,metroCode=$19,namespace=$20,page=$21,regionIsoCode=$22,regionName=$23,user=USER,user_unique=$25)

rel#1:DruidQuery.BINDABLE.[](table=[default, wikiticker],intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z])
{code}

Notice that 2 and 3 should have {{LIMIT}} pushed to {{DruidQuery}} like 1 (and should not
have {{EnumerableLimit}})



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message