drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] paul-rogers opened a new pull request #1675: DRILL-7055: Revise SELECT * to exclude partitions
Date Sun, 03 Mar 2019 22:43:51 GMT
paul-rogers opened a new pull request #1675: DRILL-7055: Revise SELECT * to exclude partitions
URL: https://github.com/apache/drill/pull/1675
 
 
   Historically, a SELECT * (wildcard) query on a partitioned table included partition directory
names as a set of "dir0", "dir1" columns. When used with files at differnt depths, this can
lead to schema change exceptions as some readers create, say, "dir0" and "dir1", while others
create just "dir0".
   
   The result is that either 1) things just work, 2) the client gets some batches with two
partition columns, others with one, or 3) a hard schema change occurs as the project operator
creates missing columns as nullable int.
   
   This change proposes to include table columns with using the wildcard and to no longer
include partition columns. Partition columns will now work the way the "implicit" file columns
already work, so this change improves consistency.
   
   The partition columns are still available: they can be requested explicitly:
   
   ```
   SELECT *, dir0, dir1 FROM ...
   ```
   
   Both before and after this change, when including the partition columns explicitly, the
nullable int issue described above will occur. However, this change positions us for the revised
scan framework that will properly provide the partition columns as nullable VARCHAR whether
a matching directory exists or not.
   
   This is a potentially breaking change: any user that uses SELECT * and expects partition
columns (and manages to work around the schema change issues) will see different behavior:
they will have to revise queries to include partition columns.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message