hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18148) NPE in SparkDynamicPartitionPruningResolver
Date Wed, 13 Dec 2017 07:08:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288814#comment-16288814
] 

liyunzhang commented on HIVE-18148:
-----------------------------------

[~lirui]:  tried to use the example you provided.
{code}

set hive.spark.dynamic.partition.pruning=true;
explain select * from src join part1 on src.key=part1.p join part2 on src.value=part2.q;

{code}

But  in my env( latest build: 095e6bf8988a03875bc9340b2ab82d5d13c4cb3c), the physical plan
 before SparkCompiler#removeNestedDPP is 
{code}
TS[0]-FIL[22]-SEL[2]-RS[9]-MAPJOIN[32]-MAPJOIN[31]-SEL[15]-FS[16]
TS[3]-FIL[23]-SEL[5]-MAPJOIN[32]
TS[6]-FIL[24]-SEL[8]-RS[13]-MAPJOIN[31]
{code}
there is no dpp operator for removeNestedDPP to traverse because HIVE-17087. SparkMapJoinOptimizer#convertJoinMapJoin
removes dpp operator when there is mapjoin operator.

So did you reproduce the NPE when setting  {{hive.auto.convert.join.noconditionaltask}} as
false?  BTW,  how does Hive on Tez deal with this kind of nested dpp case?


> NPE in SparkDynamicPartitionPruningResolver
> -------------------------------------------
>
>                 Key: HIVE-18148
>                 URL: https://issues.apache.org/jira/browse/HIVE-18148
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-18148.1.patch
>
>
> The stack trace is:
> {noformat}
> 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] ql.Driver:
FAILED: NullPointerException null
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
>         at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
> {noformat}
> At this stage, there shouldn't be a DPP sink whose target map work is null. The root
cause seems to be a malformed operator tree generated by SplitOpTreeForDPP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message