hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs
Date Fri, 20 Oct 2017 11:31:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212502#comment-16212502
] 

Rui Li commented on HIVE-17193:
-------------------------------

The main challenge here is how to decide whether two DPP works are different. In {{CombineEquivalentWorkResolver}},
we visit child tasks before its parent. That means when we visit the target map works, we
haven't seen the corresponding DPPs yet. The simplest solution is, if the DPP works' IDs (tracked
by the target map works) are different, then we consider the target map works are different
and don't combine them. The drawback is we'll lose some optimization opportunities - actually
I'm not sure whether it's possible that two target map works share the same DPP in current
implementation.

Another solution is we walk the parent tasks first, and combine equivalent DPP works. Two
DPP works can be considered equivalent as long as they output same records. It shouldn't matter
how these records are used to prune different tables. As we combine the DPP works, we update
the information in the target map works accordingly (DPP works have reference to target map
works). Then when we visit the target map works later, we know whether they should be combined.
I'm working on a PoC patch to demonstrate the idea.
[~xuefuz], [~csun], [~stakiar], [~kellyzly] do you have any suggestions?

> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
>                 Key: HIVE-17193
>                 URL: https://issues.apache.org/jira/browse/HIVE-17193
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message