hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-24081) Enable pre-materializing CTEs referenced in scalar subqueries
Date Thu, 27 Aug 2020 10:07:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-24081?focusedWorklogId=475219&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-475219
]

ASF GitHub Bot logged work on HIVE-24081:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 27/Aug/20 10:06
            Start Date: 27/Aug/20 10:06
    Worklog Time Spent: 10m 
      Work Description: kasakrisz opened a new pull request #1437:
URL: https://github.com/apache/hive/pull/1437


   ### What changes were proposed in this pull request?
   * Do phase 1 parsing of subquery expressions in order to count CTE references in those
subqueries
   * Add a config to materialize CTEs with aggregate output only
   
   
   ### Why are the changes needed?
   Improve performance of complex queries referencing the same fully aggregate CTE more than
one times.
   
   ### Does this PR introduce _any_ user-facing change?
   Adds a new config into HiveConf: `hive.optimize.cte.materialize.full.aggregate.only`.
   Prior this patch if `hive.optimize.cte.materialize.threshold` was higher than -1 all non-subquery
CTEs were materialized if they were referenced more times than the threshold. This patch limits
this to fully aggregate CTEs only by default. The original behavior can restored by setting
`hive.optimize.cte.materialize.full.aggregate.only` to false.
   
   ### How was this patch tested?
   * New q tests were added.
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=cte_mat_6.q
-pl itests/qtest -Pitests
   ```
   * Run query14 with `set hive.optimize.cte.materialize.threshold=3;`
   ```
   mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestTezPerfCliDriver -Dqfile=query14.q
-pl itests/qtest -Pitests
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 475219)
    Remaining Estimate: 0h
            Time Spent: 10m

> Enable pre-materializing CTEs referenced in scalar subqueries
> -------------------------------------------------------------
>
>                 Key: HIVE-24081
>                 URL: https://issues.apache.org/jira/browse/HIVE-24081
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Planning
>            Reporter: Krisztian Kasa
>            Assignee: Krisztian Kasa
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> HIVE-11752 introduces materializing CTE based on config
> {code}
> hive.optimize.cte.materialize.threshold
> {code}
> Goal of this jira is
> * extending the implementation to support materializing CTE's referenced in scalar subqueries
> * add a config to materialize CTEs with aggregate output only



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message