hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-17486) Enable SharedWorkOptimizer in tez on HOS
Date Thu, 02 Nov 2017 03:36:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235125#comment-16235125
] 

liyunzhang edited comment on HIVE-17486 at 11/2/17 3:35 AM:
------------------------------------------------------------

[~lirui]:
{quote}
My understanding is HoS also supports one Map connecting to multiple Reducers 
{quote}
There is only 1 RS in Map in HoS. It is true that there are cases that 1 Map is used by two
Reducers in HoS. But in HoT, 2 RS are allowed in 1 Map, the different 2 RS in the 1 Map can
transfer different data to 2 different Reducers. 
{quote}
The problem here is HoS doesn't merge equivalent works as aggressively as HoT does. 
{quote}
yes


was (Author: kellyzly):
[~lirui]:
{quote}
My understanding is HoS also supports one Map connecting to multiple Reducers 
{quote}
There is only 1 RS in Map in HoS. It is true that there are cases that 1 Map is used by two
Reducers in HoS. But in HoT, 2 RS are allowed in 1 Map, the different 2 RS in the 1 Map can
transfer different data to 2 different Reducers. 

> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
>                 Key: HIVE-17486
>                 URL: https://issues.apache.org/jira/browse/HIVE-17486
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang
>            Assignee: liyunzhang
>            Priority: Major
>         Attachments: scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be merged
so the data is read only once. Optimization will be carried out at the physical level.  In
Hive on Spark, it caches the result of spark work if the spark work is used by more than 1
child spark work. After sharedWorkOptimizer is enabled in physical plan in HoS, the identical
table scans are merged to 1 table scan. This result of table scan will be used by more 1 child
spark work. Thus we need not do the same computation because of cache mechanism.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message