hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
Date Wed, 21 Jun 2017 02:10:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837
] 

liyunzhang_intel edited comment on HIVE-11297 at 6/21/17 2:09 AM:
------------------------------------------------------------------

[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted
above, i print the operator tree  

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}

So can you retest it in your env? if the operator tree is like what you mentioned, i think
all the operator tree in spark_dynamic_partition_pruning.q.out will be different as i generated
in my env.



was (Author: kellyzly):
[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted
above, i print the operator tree of filterOp 

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, HIVE-11297.3.patch, HIVE-11297.4.patch,
HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates partition
info for more than one partition columns, multiple operator trees are created, which all start
from the same table scan op, but have different spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do table scan
multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message