trafodion-codereview mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zellerh <...@git.apache.org>
Subject [GitHub] incubator-trafodion pull request #634: JIRA TRAFODION-2137 metadata access p...
Date Wed, 03 Aug 2016 01:20:50 GMT
Github user zellerh commented on a diff in the pull request:

    https://github.com/apache/incubator-trafodion/pull/634#discussion_r73267154
  
    --- Diff: core/sql/sqlcomp/CmpSeabaseDDLcommon.cpp ---
    @@ -6019,6 +6181,93 @@ short CmpSeabaseDDL::updateTextTable(ExeCliInterface *cliInterface,
       return 0;
     }
     
    +short CmpSeabaseDDL::updateTextTableWithBinaryData
    +(ExeCliInterface *cliInterface,
    + Int64 objUID, 
    + ComTextType textType, 
    + Lng32 subID, 
    + char * inputData,
    + Int32 inputDataLen,
    + NABoolean withDelete)
    +{
    +  Lng32 cliRC = 0;
    +  if (withDelete)
    +    {
    +      // Note: It might be tempting to try an upsert instead of a
    +      // delete followed by an insert, but this won't work. It is
    +      // possible that the metadata text could shrink and take fewer
    +      // rows in its new form than the old. So we do the simple thing
    +      // to avoid such complications.
    +      cliRC = deleteFromTextTable(cliInterface, objUID, textType, subID);
    +      if (cliRC < 0)
    +        {
    +          return -1;
    +        }
    +    }
    +
    +  // convert input data to utf8 first.
    +  ComDiagsArea * diagsArea = CmpCommon::diags();
    +  char * inputDataUTF8 = new(STMTHEAP) char[inputDataLen*4];
    +  Lng32 inputDataLenUTF8 = 0;
    +  ex_expr::exp_return_type rc =
    +    convDoIt(inputData,
    --- End diff --
    
    Treating the binary data as ISO8859-1 and converting that to UTF-8 is one way, but it
is not very efficient and it results in unprintable characters. Another option would be to
use 64 printable characters like '0' (zero) to 'o' (lower case O) and chop the binary data
up into 6 bit pieces and encode each one into one of these characters. The overhead would
be 33%. The ISO88591 method uses somewhere between 0 and 100% overhead, probably less than
50% in typical cases. It may not be worth it from the point of saving space, but it would
avoid the unprintable characters.
    
    Of course you can blame me for suggesting the ISO88591 method in the first place, so I
guess I shouldn't complain about it now :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message