hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-18148) NPE in SparkDynamicPartitionPruningResolver
Date Wed, 20 Dec 2017 04:31:02 GMT

    [ https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297875#comment-16297875
] 

liyunzhang commented on HIVE-18148:
-----------------------------------

sorry for reply late. still have 1 question about the code

{code}

621	  /** For DPP sinks w/ common join, we'll split the tree and what's above the branching
622	   * operator is computed multiple times. Therefore it may not be good for performance
to support
623	   * nested DPP sinks, i.e. one DPP sink depends on other DPP sinks.
624	   * The following is an example:
625	   *
626	   *             TS          TS
627	   *             |           |
628	   *            ...         FIL
629	   *            |           |  \
630	   *            RS         RS  SEL
631	   *              \        /    |
632	   *     TS          JOIN      GBY
633	   *     |         /     \      |
634	   *    RS        RS    SEL   DPP2
635	   *     \       /       |
636	   *       JOIN         GBY
637	   *                    |
638	   *                  DPP1
639	   *
640	   * where DPP1 depends on DPP2.
641	   *
642	   * To avoid such case, we'll visit all the branching operators. If a branching operator
has any
643	   * further away DPP branches w/ common join in its sub-tree, such branches will be removed.
644	   * In the above example, the branch of DPP1 will be removed.
645	   */
{code}

this function will  first collect the branching operators(FIL,JOIN in above example). then
remove the nested DPP in the  branches.  If first traverses FIL, then remove DPP1 , If first
tranverses JOIN, then remove DPP2.   This function will randomly remove one of nested DPPs.
 Here I am confused how to judge which dpp need to be removed?  If my understanding is not
right, tell me. 

> NPE in SparkDynamicPartitionPruningResolver
> -------------------------------------------
>
>                 Key: HIVE-18148
>                 URL: https://issues.apache.org/jira/browse/HIVE-18148
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>            Reporter: Rui Li
>            Assignee: Rui Li
>         Attachments: HIVE-18148.1.patch, HIVE-18148.2.patch
>
>
> The stack trace is:
> {noformat}
> 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] ql.Driver:
FAILED: NullPointerException null
> java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180)
>         at org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125)
>         at org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74)
>         at org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568)
> {noformat}
> At this stage, there shouldn't be a DPP sink whose target map work is null. The root
cause seems to be a malformed operator tree generated by SplitOpTreeForDPP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message