trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DaveBirdsall <...@git.apache.org>
Subject [GitHub] incubator-trafodion pull request #744: [TRAFODION-2251] Fix upd stats issues...
Date Tue, 04 Oct 2016 21:48:59 GMT
GitHub user DaveBirdsall opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/744

    [TRAFODION-2251] Fix upd stats issues with long char/varchar columns

    The following issues in UPDATE STATISTICS concerning long char/varchar columns have been
fixed:
    
    1. Creating a sample table would fail if the source table was a Hive table with columns
longer than the Trafodion maximum (200000 bytes).
    2. UPDATE STATISTICS might fail because the Optimizer raised a 2053 warning during pass
2 optimization.  This warning happens on some queries on Hive tables having columns of over
1 million bytes in length. Now UPDATE STATISTICS will ignore the warning and continue. (The
Optimizer does generate a plan in spite of the warning.)
    3. UPDATE STATISTICS might abend resulting in a core file when sorting varchars of longer
than 32767 bytes.
    
    Design notes:
    
    The 32767 limitation is due to the ISVarChar class using a 2-byte length field. I chose
to not increase this to 4 bytes as that would reduce the amount of memory available for existing
UPDATE STATISTICS scenarios. Instead, I chose to truncate longer values to fit within 32767
bytes using the SQL SUBSTRING function. The result is that UEC might be underestimated for
such columns, for example if the first 32767 bytes of two values are equal and the values
differ at the 32768th or later byte. We can lift this limitation either by accepting the loss
of memory to a longer length field, or by adding another class, ISLongVarChar say, for these
long values. That work is left for the future, when needed.
    
    The logic to create the sample table used to use CREATE TABLE LIKE. This doesn't work
for Hive source tables having char/varchar columns of longer than 200000 bytes, as Trafodion
imposes a 200000 byte limit on its own columns. Instead, we use CREATE TABLE AS SELECT, and
use SUBSTRING to limit the size of the long char/varchar columns. (We also use SUBSTRING on
the subsequent UPSERT used to populate the table.)
    
    CREATE TABLE AS SELECT did not support running in a user transaction, but UPDATE STATISTICS
needs to do so. Fortunately, we only use the NO LOAD flavor of CREATE TABLE AS SELECT, so
there is no non-transactional load step. Therefore, the ExExeUtilLoad tcb has been changed
to support running in a user transaction if NO LOAD has been specified.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DaveBirdsall/incubator-trafodion Trafodion2251

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/744.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #744
    
----
commit 8a7fe53736fb5720199e23864910df3cb092155d
Author: Dave Birdsall <dbirdsall@apache.org>
Date:   2016-10-04T21:35:42Z

    [TRAFODION-2251] Fix upd stats issues with long char/varchar columns

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message