drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sachouche <...@git.apache.org>
Subject [GitHub] drill issue #976: DRILL-5797: Choose parquet reader from read columns
Date Wed, 01 Nov 2017 17:51:00 GMT
Github user sachouche commented on the issue:

    https://github.com/apache/drill/pull/976
  
    Looking at the stack trace:
    - The code definitely is initializing a column of type REPEATABLE
    - The Fast Reader didn't expect this scenario so it used a default container (NullableVarBinary)
for VL binary DT
    
    Why this is happening?
    - The code in ReadState::buildReader() is processing all selected columns
    - This information is obtained from the ParquetSchema
    - Looking at the code, this seems a case-sensitivity issue
    - The ParquetSchema is case-insensitive whereas the Parquet GroupType is not
    - Damien added a catch handler (column not found) to handle use-cases where we are projecting
non-existing columns
    - This basically is leading to an unforeseen use-case
    - Assume column XYZ is complex
    - User uses an alias (xyz)
    - The new code will allow this column to pass and treat is as simple
    - The ParquetSchema is being case insensitive will process this column
    - and thus the exception in the test suite
    
    Suggested Fix
    - Create a map (key to-lower-case) and register all current row-group columns
    - Use this map to locate a selected column type



---

Mime
View raw message