Github user sachouche commented on the issue:
https://github.com/apache/drill/pull/976
Looking at the stack trace:
- The code definitely is initializing a column of type REPEATABLE
- The Fast Reader didn't expect this scenario so it used a default container (NullableVarBinary)
for VL binary DT
Why this is happening?
- The code in ReadState::buildReader() is processing all selected columns
- This information is obtained from the ParquetSchema
- Looking at the code, this seems a case-sensitivity issue
- The ParquetSchema is case-insensitive whereas the Parquet GroupType is not
- Damien added a catch handler (column not found) to handle use-cases where we are projecting
non-existing columns
- This basically is leading to an unforeseen use-case
- Assume column XYZ is complex
- User uses an alias (xyz)
- The new code will allow this column to pass and treat is as simple
- The ParquetSchema is being case insensitive will process this column
- and thus the exception in the test suite
Suggested Fix
- Create a map (key to-lower-case) and register all current row-group columns
- Use this map to locate a selected column type
---
|