trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DaveBirdsall <...@git.apache.org>
Subject [GitHub] trafodion pull request #1443: [TRAFODION-2645] Fixes to MDAM costing code fo...
Date Wed, 14 Feb 2018 01:03:27 GMT
GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/trafodion/pull/1443

    [TRAFODION-2645] Fixes to MDAM costing code found during inner join tests

    This set of changes fixes some issues that were found in the new MDAM costing code while
testing it for the case of an inner table of a nested join.
    
    The fixes are:
    
    1. We were way too pessimistic in calculating the number of seeks. We assumed we would
do a seek per block, but it is obvious that the number of seeks is bounded from above by the
number of subsets. This showed up when testing queries with tiny numbers of probes (4) but
large numbers of rows per sequential subset (over 100 blocks worth in total).
    
    2. We were applying some rowsize adjustment factors twice, once in NewMDAMCostWA::compute,
and again in FileScanOptimizer::scmComputeMDAMCostForHbase, which it calls. The duplicate
logic was removed in NewMDAMCostWA::compute.
    
    3. We were overestimating the number of fetch subset rows when costing MDAM for the inner
table of a nested join. It turns out that the histogram handling does a cross product scale-up
of the histograms in order to calculate the effect of a join predicate. But this scale-up
looks at things from the standpoint of the complete result of the query, while the MDAM costing
logic is looking at things from the standpoint of a single outer table join probe. Dividing
fetch row counts by the number of outer table join probes corrects the problem. Then at a
higher level we multiply by the number of outer table join probes (covering both MDAM probes
and MDAM fetches) to calculate the total cost of the MDAM scan.
    
    Notes: The new MDAM costing code is still off by default. To turn it on, use CQD MDAM_COSTING_REWRITE
'ON'. To test MDAM on the inner table of nested joins, one must either force such plans with
control query shape or turn off heuristics that prevent MDAM in these cases. There are two
of main interest: CQD MDAM_UNDER_NJ_PROBES_THRESHOLD turns off MDAM when the number of outer
table probes is higher than its value. It defaults to '0' so by default MDAM is (almost) always
turned off on the inner table of nested joins. Setting this to a positive value (I used '1000'
for my testing) allows MDAM. Secondly, there is a heuristic in the JoinToTSJ transformation
rule that prevents nested joins if no inner table key prefix is covered by some predicate.
This prevents consideration of nested joins where MDAM may help when a predicate is missing
on the leading column. Setting CQD KEYLESS_NESTED_JOINS 'ON' overrides this heuristic. Alternatively,
turning off hash joins and merge joins also ove
 rrides this heuristic.
    
    Summarizing, if one wishes to test the new MDAM costing code, and wishes to benefit from
nested joins using MDAM on the inner table, then one should presently set these CQDs:
    
    ```
    CQD MDAM_UNDER_NJ_PROBES_THRESHOLD '1000';  -- (some positive value greater than one)
    CQD KEYLESS_NESTED_JOINS 'ON';
    CQD MDAM_COSTING_REWRITE 'ON';
    ```
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/trafodion MdamRewriteNJ

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/trafodion/pull/1443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1443
    
----
commit c008ebe519f51e5344b650a9c88e66e973320084
Author: Dave Birdsall <dbirdsall@...>
Date:   2018-02-14T00:42:41Z

    [TRAFODION-2645] Fixes to MDAM costing code found during inner join testing

----


---

Mime
View raw message