Hi Denis You seem to be trying to read a Parquet 2.0 format file with a Parquet 1.10 reader that comes with Drill. Is there a specific reason you are using version 2.0 ? ~ Kunal On 3/11/2019 10:13:39 AM, Denis Dudinski wrote: Hello, I have a parquet 2.0 file which contains serialised avro records. Records avro schema is plain but contains a couple of optional string fields: { "namespace" : “proto.avro.v1", "type" : "record", "name" : “FactEntity", "fields" : [ {"name" : “sensorName", "type" : "string"}, {"name" : “sensorDesc", "type" : "string”}, {"name" : "firstDeployed", "type" : "long"}, {"name" : "lastRenewed", "type" : "long"}, {"name" : “errMsg", "type" : ["null", "string"]}, {"name" : “errDetails", "type" : ["null", "string"]} ] } When I try to query entities in this file with SELECT t1.sensorName, t1.sensorDesc, t1.lastRenewed, t1.errMsg FROM dfs.`/path/to/file` t1 LIMIT 10; I get this error: 2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor - Starting fragment 0:0 on xxx:31010 2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG o.a.d.e.s.p.DrillParquetReader - Requesting schema message proto.avro.v1.FactEntity { required binary sensorName (UTF8); required binary sensorDesc (UTF8); required int64 firstDeployed; required int64 lastRenewed; optional binary errMsg (UTF8); optional binary errDetails (UTF8); } 2019-03-07 12:07:30,615 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Error in drill parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: null (Error in drill parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: null) org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Error in drill parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: null Please, refer to logs for more information. [Error Id: 2b5a06a0-fa8e-497b-848d-01aae15874ee ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633) ~[drill-common-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:101) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284) [drill-java-exec-1.15.0.jar:1.15.0] at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161] at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) [hadoop-common-2.7.4.jar:na] at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.15.0.jar:1.15.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161] Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet reader (complex). Message: Failure in setting up reader Parquet Metadata: null at org.apache.drill.exec.store.parquet2.DrillParquetReader.handleAndRaise(DrillParquetReader.java:273) ~[drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:265) ~[drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:321) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:216) [drill-java-exec-1.15.0.jar:1.15.0] at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271) [drill-java-exec-1.15.0.jar:1.15.0] ... 27 common frames omitted Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error reading page. File path: /filepath/xxx Row count: 3730439 Column Chunk Metadata: ColumnMetaData{GZIP [errMsg] optional binary errMsg (UTF8) [DELTA_BYTE_ARRAY], 16876631} Page Header: PageHeader(type:DATA_PAGE_V2, uncompressed_page_size:15, compressed_page_size:32, data_page_header_v2:DataPageHeaderV2(num_values:3730439, num_nulls:3730439, num_rows:3730439, encoding:DELTA_BYTE_ARRAY, definition_levels_byte_length:5, repetition_levels_byte_length:0, statistics:Statistics(null_count:3730439))) File offset: 16876631 Size: 69 Value read so far: 3730439 at org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:226) ~[drill-java-exec-1.15.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:525) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:638) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:353) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:80) ~[parquet-column-1.10.0.jar:1.10.0] at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:262) ~[drill-java-exec-1.15.0.jar:1.15.0] ... 30 common frames omitted Caused by: java.io.IOException: not a gzip file at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496) ~[hadoop-common-2.7.4.jar:na] at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257) ~[hadoop-common-2.7.4.jar:na] at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186) ~[hadoop-common-2.7.4.jar:na] at org.apache.parquet.hadoop.DirectCodecFactory$IndirectDecompressor.decompress(DirectCodecFactory.java:162) ~[parquet-hadoop-1.10.0.jar:1.10.0] at org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:188) ~[drill-java-exec-1.15.0.jar:1.10.0] ... 43 common frames omitted 2019-03-07 12:07:30,616 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO o.a.d.e.w.fragment.FragmentExecutor - 237f20ac-b634-5300-06f5-6c731a8a97f2:0:0: State change requested RUNNING --> FAILED I’m running queries via sqlline with session parameter "set `store.parquet.use_new_reader` = true;” (otherwise it fails even without optional binary columns included). Is there some workaround for this problem? Thanks, Denis