drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers commented on issue #1726: DRILL-7143: Support default value for empty columns
Date Sun, 07 Apr 2019 23:10:58 GMT
paul-rogers commented on issue #1726: DRILL-7143: Support default value for empty columns
URL: https://github.com/apache/drill/pull/1726#issuecomment-480639697
 
 
   @arina-ielchiieva, your summary is mostly correct. Just to refine a bit.
   
   By default, if code using the row set framework asks to convert strings to other types,
blanks have no special meaning. A blank will be parsed as any other string, which typically
produces an error.
   
   Any client of the row set framework can specify a blank-handling policy. Using an internal
property set. The name of this internal property is `blank-as`. There are four choices:
   
   * Unset: use the default policy described above.
   * `null`: If the column is nullable, treat the blank as null. If non-nullable, leave the
blank unchanged.
   * `0`: Replace blanks with the value "0" for numeric types.
   * `skip`: Skip blank values. This will set the column to its default value: `NULL` for
nullable columns, the default value for non-nullable columns. If no default is set, then the
"default default" of all-zero bytes is used.
   
   (Note that I renamed "simple" to "skip".)
   
   Normally, the blank policy is set by the reader. For example for CSV, it seemed to make
sense to use the `skip` policy.
   
   But, to provide maximum flexibility (and because there are many different requirements),
the user can also optionally set the `drill.blank-as` property on a column. If set, that property
overrides anything the reader may have set. For example, suppose I want to use -1 for missing
columns, but 0 for blank columns. I could set the column default value to -1, then set the
`drill.blank-as` property to `0`.
   
   The bottom line for users of the CSV file format, with a schema, is that, by default, blanks
are skipped and either become `NULL` or the default value.
   
   Note also that this change strips leading and trailing white space from columns prior to
type conversion. So a value of "  " is trimmed to "" and treated as a blank string. Trimming
is *not* done for values stored as VARCHAR. In this case, if the value is "  ", that is what
will be stored in the vector.
   
   This is all a first draft. Let's get this into the hands of users as an "alpha", get some
feedback, and adjust the code based on what we learn.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message