drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Updike, Clark" <Clark.Upd...@jhuapl.edu>
Subject NPE reading parquet files generated by Spark
Date Mon, 29 Jun 2020 13:34:46 GMT
I keep getting an NPE whenever I try to read parquet files generated by Spark using 1.18 nightly
(June 9).

$ ls /mnt/Drill/parqJsDf_0625/dt\=2016-10-31/ | head -n 2
    part-00000-blah.snappy.parquet
    part-00001-blah.snappy.parquet

No matter how I query it:
    apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` where dir0='dt\=2016-10-31'
limit 2;
    apache drill> select * from dfs.`mnt_drill`.`parqJsDf_0625` limit 2;

I get an exception related to the partitioning:

Caused By (java.lang.NullPointerException) null
    org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn():186
    org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect():119
    org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>():59
    org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics():293
    org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata():249
    org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata():203
    org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init():170
    org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():95
    org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>():48
    org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build():415
    org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():150
    org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>():120
    org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():202
    org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan():79
    org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():226
    org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan():209
    org.apache.drill.exec.planner.logical.DrillTable.getGroupScan():119
    org.apache.drill.exec.planner.common.DrillScanRelBase.<init>():51
    org.apache.drill.exec.planner.logical.DrillScanRel.<init>():76
    org.apache.drill.exec.planner.logical.DrillScanRel.<init>():65
    org.apache.drill.exec.planner.logical.DrillScanRel.<init>():58
    org.apache.drill.exec.planner.logical.DrillScanRule.onMatch():38
    org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch():208
    org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp():633
    org.apache.calcite.tools.Programs$RuleSetProgram.run():327
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():405
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform():351
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel():245
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel():308
    org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():173
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():283
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan():163
    org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan():128
    org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():93
    org.apache.drill.exec.work.foreman.Foreman.runSQL():593

The files are valid parquet... I can use parquet tools on them just fine.  I can read the
same files back in using Spark.  I have tested with and without partitioning when writing
from Spark.  I have tried it both with and without snappy compression.  Always the same NPE.
 Any insight appreciated...

Mime
View raw message