drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Denis Dudinski <denis.dudin...@gmail.com>
Subject Drill 1.15.0 fails with error while quering Parquet 2.0 file
Date Thu, 07 Mar 2019 09:56:10 GMT
Hello,

I have a parquet 2.0 file which contains serialised avro records. Records avro schema is plain
but contains a couple of optional string fields:

{
    "namespace" : “proto.avro.v1",
    "type" : "record",
    "name" : “FactEntity",
    "fields" : [
        {"name" : “sensorName", "type" : "string"},
        {"name" : “sensorDesc", "type" : "string”},
        {"name" : "firstDeployed", "type" : "long"},
        {"name" : "lastRenewed", "type" : "long"},
        {"name" : “errMsg", "type" : ["null", "string"]},
        {"name" : “errDetails", "type" : ["null", "string"]}
    ]
}

When I try to query entities in this file with

SELECT 
t1.sensorName, 
t1.sensorDesc, 
t1.lastRenewed, 
t1.errMsg
FROM dfs.`/path/to/file` t1
LIMIT 10;

I get this error:

2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG o.a.d.e.w.fragment.FragmentExecutor
- Starting fragment 0:0 on xxx:31010
2019-03-07 12:07:30,593 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] DEBUG o.a.d.e.s.p.DrillParquetReader
- Requesting schema message proto.avro.v1.FactEntity {
  required binary sensorName (UTF8);
  required binary sensorDesc (UTF8);
  required int64 firstDeployed;
  required int64 lastRenewed;
  optional binary errMsg (UTF8);
  optional binary errDetails (UTF8);
}

2019-03-07 12:07:30,615 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO  o.a.d.exec.physical.impl.ScanBatch
- User Error Occurred: Error in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null (Error in drill parquet reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null)
org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Error in drill parquet
reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null


Please, refer to logs for more information.

[Error Id: 2b5a06a0-fa8e-497b-848d-01aae15874ee ]
	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:633)
~[drill-common-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:293) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext(LimitRecordBatch.java:101)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:143)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:83)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:297)
[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:284)
[drill-java-exec-1.15.0.jar:1.15.0]
	at java.security.AccessController.doPrivileged(Native Method) [na:1.8.0_161]
	at javax.security.auth.Subject.doAs(Subject.java:422) [na:1.8.0_161]
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) [hadoop-common-2.7.4.jar:na]
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:284) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.15.0.jar:1.15.0]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_161]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_161]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_161]
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet
reader (complex).
Message: Failure in setting up reader
Parquet Metadata: null
	at org.apache.drill.exec.store.parquet2.DrillParquetReader.handleAndRaise(DrillParquetReader.java:273)
~[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:265)
~[drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:321) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:216) [drill-java-exec-1.15.0.jar:1.15.0]
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271) [drill-java-exec-1.15.0.jar:1.15.0]
	... 27 common frames omitted
Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error reading page.
File path: /filepath/xxx
Row count: 3730439
Column Chunk Metadata: ColumnMetaData{GZIP [errMsg] optional binary errMsg (UTF8)  [DELTA_BYTE_ARRAY],
16876631}
Page Header: PageHeader(type:DATA_PAGE_V2, uncompressed_page_size:15, compressed_page_size:32,
data_page_header_v2:DataPageHeaderV2(num_values:3730439, num_nulls:3730439, num_rows:3730439,
encoding:DELTA_BYTE_ARRAY, definition_levels_byte_length:5, repetition_levels_byte_length:0,
statistics:Statistics(null_count:3730439)))
File offset: 16876631
Size: 69
Value read so far: 3730439
	at org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:226)
~[drill-java-exec-1.15.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:532) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:525) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:638) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:353)
~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:80)
~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:75)
~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:271)
~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:80) ~[parquet-column-1.10.0.jar:1.10.0]
	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:262)
~[drill-java-exec-1.15.0.jar:1.15.0]
	... 30 common frames omitted
Caused by: java.io.IOException: not a gzip file
	at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.processBasicHeader(BuiltInGzipDecompressor.java:496)
~[hadoop-common-2.7.4.jar:na]
	at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.executeHeaderState(BuiltInGzipDecompressor.java:257)
~[hadoop-common-2.7.4.jar:na]
	at org.apache.hadoop.io.compress.zlib.BuiltInGzipDecompressor.decompress(BuiltInGzipDecompressor.java:186)
~[hadoop-common-2.7.4.jar:na]
	at org.apache.parquet.hadoop.DirectCodecFactory$IndirectDecompressor.decompress(DirectCodecFactory.java:162)
~[parquet-hadoop-1.10.0.jar:1.10.0]
	at org.apache.parquet.hadoop.ColumnChunkIncReadStore$ColumnChunkIncPageReader.readPage(ColumnChunkIncReadStore.java:188)
~[drill-java-exec-1.15.0.jar:1.10.0]
	... 43 common frames omitted
2019-03-07 12:07:30,616 [237f20ac-b634-5300-06f5-6c731a8a97f2:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 237f20ac-b634-5300-06f5-6c731a8a97f2:0:0: State change requested RUNNING --> FAILED 


I’m running queries via sqlline with session parameter "set `store.parquet.use_new_reader`
= true;” (otherwise it fails even without optional binary columns included).

Is there some workaround for this problem?

Thanks,
Denis
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message