trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zellerh <...@git.apache.org>
Subject [GitHub] incubator-trafodion pull request: [Trafodion 762] support HIVE dat...
Date Tue, 24 May 2016 16:51:03 GMT
Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/496#discussion_r64430466
  
    --- Diff: core/sql/optimizer/NATable.cpp ---
    @@ -3553,6 +3553,48 @@ NAType* getSQColTypeForHive(const char* hiveType, NAMemory* heap)
       if ( !strcmp(hiveType, "timestamp"))
         return new (heap) SQLTimestamp(TRUE /* allow NULL */ , 6, heap);
     
    +  if ( !strcmp(hiveType, "date"))
    +    return new (heap) SQLDate(TRUE /* allow NULL */ , heap);
    +
    +  if ( !strncmp(hiveType, "varchar", 7) )
    +  {
    +    char maxLen[32];
    +    memset(maxLen, 0, 32);
    +    int i=0,j=0;
    +    int copyit = 0;
    +
    +    //get length
    +    for(i = 0; i < strlen(hiveType) ; i++)
    +    {
    +      if(hiveType[i] == '(') //start
    +      {
    +        copyit=1;
    +        continue;
    +      }
    +      else if(hiveType[i] == ')') //stop
    +        break; 
    +      if(copyit > 0)
    +      {
    +        maxLen[j] = hiveType[i];
    +        j++;
    +      }
    +    }
    +    Int32 len = atoi(maxLen);
    +
    +    if(len == 0) return NULL;  //cannot parse correctly
    +
    +    NAString hiveCharset =
    +        ActiveSchemaDB()->getDefaults().getValue(HIVE_DEFAULT_CHARSET);
    +
    +    return new (heap) SQLVarChar(CharLenInfo((hiveCharset == CharInfo::UTF8 ? 0 : len),len),
    --- End diff --
    
    This creates a type VARCHAR(n BYTES) CHARACTER SET UTF8, if HIVE_DEFAULT_CHARSET is UTF8.
Hive, however, does not have this optimization, and it will allow n *characters* in the column.
I hate to waste so much space, but I think we need to multiply the second "len" by the max
length in bytes of a character.
    
    I just also note that this code compares the string hiveCharset with CharInfo::UTF8, which
is a number. You copied this code from the "string" case, which also may be wrong. We should
be converting this to upper case and comparing it to SQLCHARSETSTRING_UTF8. It probably works
by chance, the comparison is always FALSE, so we always pass "len" to the CharLenInfo constructor,
but this is irrelevant for anything other than UTF8.
    
    Here is my suggested code:
    
    ```C++
    hiveCharset.toUpper();
    CharInfo::CharSet hiveCharsetEnum = CharInfo::getCharSetEnum(hiveCharset);
    Int32 maxNumChars = 0;
    Int32 storageLen = len;
    ```
    
    The following applies to the VARCHAR case only, but the code above and below should also
be changed for the STRING case:
    
    ```C++
    if (CharInfo::isVariableWidthMultiByteCharSet(hiveCharsetEnum))
      {
        // For Hive VARCHARs, the number specified is the max. number of characters,
        // while we count in bytes when using HIVE_MAX_STRING_LENGTH for Hive STRING
        // columns. Set the max character constraint and also adjust the required storage
length.
        maxNumChars = len;
        storageLen = len * CharInfo::maxBytesPerChar(hiveCharsetEnum);
      }
    ```
    
    Again, this applies to both cases:
    
    ```C++
    return new (heap) SQLVarChar(CharLenInfo(maxNumChars, storageLen),
                                 ...
    ```
    
    Note that we are now treating STRING and VARCHAR differently. Let's say HIVE_MAX_STRING_LENGTH
is set to 32.
    
    STRING becomes VARCHAR(32 BYTES) CHARACTER SET UTF8.
    
    VARCHAR(32) becomes VARCHAR(32) CHARACTER SET UTF8.
    
    Is that ok or do we need still more CQDs or a different default behavior?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message