hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <>
Subject [jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]
Date Mon, 05 Jun 2017 05:41:04 GMT


liyunzhang_intel commented on HIVE-11297:

[~csun]: thanks for review. reply you on review board.
bq.Seems this removes the extra map work after it was generated. Is there a way to avoid generating
the map work in the first place?
physical operator tree will by spark partition pruningsink
original tree:
after split by spark partition pruningsink:
If we want to avoid generating multiple map works({noformat}TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20],TS[1]-FIL[17]-SEL[18]-GBY[22]-SPARKPRUNINGSINK[23]{noformat}),
we need remove the rule of spark dynamic partition pruning. If we remove that rule, exception
will be thrown because the remaining tree will not be in a MapWork (   
opRules.put(new RuleRegExp("Split Work - SparkPartitionPruningSink",
    SparkPartitionPruningSinkOperator.getOperatorName() + "%"), genSparkWork);


If you have idea about this, please give me your suggestion.

> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>                 Key: HIVE-11297
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch
> Currently, for dynamic partition pruning in Spark, if a small table generates partition
info for more than one partition columns, multiple operator trees are created, which all start
from the same table scan op, but have different spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do table scan
multiple times.

This message was sent by Atlassian JIRA

View raw message