hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sahil Takiar (JIRA)" <>
Subject [jira] [Commented] (HIVE-17193) HoS: don't combine map works that are targets of different DPPs
Date Mon, 23 Oct 2017 18:24:00 GMT


Sahil Takiar commented on HIVE-17193:

{quote} The drawback is we'll lose some optimization opportunities - actually I'm not sure
whether it's possible that two target map works share the same DPP in current implementation.
{quote} As far as I know, this isn't possible. A DPP subtree can only be used to prune a single
target {{MapWork}} - although that is something we want to change in HIVE-17178

{quote} Two DPP works can be considered equivalent as long as they output same records. {quote}
I'm not sure how this would work, you don't know what a DPP work will output until the query
actually starts to run.

I think a good fix here would to be just implement HIVE-17178 (I'm not sure, but this may
be the same as HIVE-17877). If two DPP sinks are completely equivalent (same source table,
filters, operations, etc.), but they only differ by the value of {{Target Work}}, then I think
we should be able to combine them into a single DPP tree, with multiple target works. The
value of the target work shouldn't change the value of the data that is written by a DPP subtree,
so if the subtrees are equivalent, we can combine them. The main work will be to change the
DPP code so that there can be multiple Target Works. 

> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>                 Key: HIVE-17193
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value) b
> on a.key=b.key;
> {code}

This message was sent by Atlassian JIRA

View raw message