drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5199) Planner inserts three projects when one will do
Date Mon, 16 Jan 2017 18:58:26 GMT
Paul Rogers created DRILL-5199:
----------------------------------

             Summary: Planner inserts three projects when one will do
                 Key: DRILL-5199
                 URL: https://issues.apache.org/jira/browse/DRILL-5199
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.9.0
            Reporter: Paul Rogers
            Priority: Minor


See the query and description for DRILL-5198. The plan in that query has a number of opportunities
for improvement. This bug touches on a minor issue: the plan has a series of three project
operators in series when a single project would probably work just as well (and would be somewhat
more efficient.)

Here is the subset of the plan in question:

{code}
02-01                        UnorderedMuxExchange : rowType = RecordType(ANY T0¦¦*, ANY
EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, cumulative cost = {7.17696212E8
rows, 1.973664583E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 449
03-01                          Project(T0¦¦*=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
: rowType = RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
1.79424053E8, cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 448
03-02                            Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : rowType = RecordType(ANY
T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {3.58848106E8 rows, 1.076544318E9
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 447
03-03                              Project(T0¦¦*=[$0], columns=[$1]) : rowType = RecordType(ANY
T0¦¦*, ANY columns): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8 rows, 3.58848106E8
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
03-04                                Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
: rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8
rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445
{code}

This issue is minor because project is a relatively inexpensive operation (insert or remove
a vector, done batch-by-batch, rather than a row-by-row operation.) Still, every little bit
of optimization helps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message