I keep getting an NPE whenever I try to read parquet files generated by Spark using 1.18 nightly
(June 9).
$ ls /mnt/Drill/parqJsDf_0625/dt\=2016-10-31/ | head -n 2
part-00000-blah.snappy.parquet
part-00001-blah.snappy.parquet
No matter how I query it:
apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` where dir0='dt\=2016-10-31'
limit 2;
apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` limit 2;
I get an exception related to the partitioning:
Caused By (java.lang.NullPointerException) null
org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn():186
org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect():119
org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>():59
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics():293
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata():249
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata():203
org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init():170
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():95
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():48
org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build():415
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():150
org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():120
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():202
org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():79
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():226
org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():209
org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():119
org.apache.drill.exec.planner.common.DrillScanRelBase.<init>():51
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():76
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():65
org.apache.drill.exec.planner.logical.DrillScanRel.<init>():58
org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():38
org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():208
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():633
org.apache.calcite.tools.Programs$RuleSetProgram.run():327
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():405
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():351
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():245
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():308
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():173
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():283
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():163
org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():128
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():93
org.apache.drill.exec.work.foreman.Foreman.runSQL():593
The files are valid parquet... I can use parquet tools on them just fine. I can read the
same files back in using Spark. I have tested with and without partitioning when writing
from Spark. I have tried it both with and without snappy compression. Always the same NPE.
Any insight appreciated...
|