hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18148) NPE in SparkDynamicPartitionPruningResolver
Date Mon, 11 Dec 2017 03:12:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16285468#comment-16285468
] 

liyunzhang commented on HIVE-18148:
-----------------------------------

[~lirui]: I can not reproduce because you did not provide all the script to reproduce it.
But I guess following script matches you mentioned
[spark_dynamic_partition_pruning.q#https://github.com/kellyzly/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L29]
{code}
EXPLAIN select count(*) from srcpart join srcpart_date on (srcpart.ds = srcpart_date.ds) join
srcpart_hour on (srcpart.hr = srcpart_hour.hr)
where srcpart_date.`date` = '2008-04-08' and srcpart_hour.hour = 11;
{code}
srcpart is similar as src , srcpart_data is similar as part1 and srcpart_hour is similiar
as part2 in your example. But the operator tree is like
{code}
TS[0]-SEL[2]-MAPJOIN[36]-MAPJOIN[35]-GBY[16]-RS[17]-GBY[18]-FS[20]
TS[3]-FIL[27]-SEL[5]-RS[10]-MAPJOIN[36]
                    -SEL[29]-GBY[30]-SPARKPRUNINGSINK[31]
TS[6]-FIL[28]-SEL[8]-RS[13]-MAPJOIN[35]
                    -SEL[32]-GBY[33]-SPARKPRUNINGSINK[34]
{code}
Here TS\[3\] is srcpart_date ,TS\[6\] is srcpart_hour, TS\[0\] is src. But there is no nested
DPP problem here. So where is wrong?

> NPE in SparkDynamicPartitionPruningResolver
> -------------------------------------------
>
>                 Key: HIVE-18148
>                 URL: https://issues.apache.org/jira/browse/HIVE-18148
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> The stack trace is:
> {noformat}
> 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] ql.Driver:
FAILED: NullPointerException null
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
>         at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
> {noformat}
> At this stage, there shouldn't be a DPP sink whose target map work is null. The root
cause seems to be a malformed operator tree generated by SplitOpTreeForDPP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message