drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cdmikechen (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-7934) NullPointerException error when reading parquet files
Date Tue, 01 Jun 2021 00:25:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

cdmikechen resolved DRILL-7934.
-------------------------------
      Reviewer: Cong Luo
    Resolution: Fixed

Have merged by pr [DRILL-7934: Fix NullPointerException error when reading parquet files
by cdmikechen · Pull Request #2238 · apache/drill (github.com)|https://github.com/apache/drill/pull/2238]

> NullPointerException error when reading parquet files
> -----------------------------------------------------
>
>                 Key: DRILL-7934
>                 URL: https://issues.apache.org/jira/browse/DRILL-7934
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.18.0
>         Environment: Drill 1.18 
> Ambari 2.7.4
> Spark 3.0.2
>            Reporter: cdmikechen
>            Priority: Critical
>             Fix For: 1.19.0
>
>         Attachments: part-00000-e849bed7-5cc2-480c-96d8-3fe5f9b4294a-c000.snappy.parquet,
part-00001-e849bed7-5cc2-480c-96d8-3fe5f9b4294a-c000.snappy.parquet
>
>
> I create a dataset using spark ml, when I use drill 1.18 to query this dataset folder,
it report error this:
> {code:java}
> [Error Id: 92d3f331-ffca-46b5-a64c-87453b88a108 on xxx.xxx.xxx:31010]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:788)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.checkCommonStates(QueryStateProcessor.java:322)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.planning(QueryStateProcessor.java:216)
>         at org.apache.drill.exec.work.foreman.QueryStateProcessor.moveToState(QueryStateProcessor.java:76)
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:300)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception
during fragment initialization: Error while applying rule DrillPushProjectIntoScanRule:enumerable,
args [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default,
/home/spark/dataset/default/test2/*.parquet])]
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:301)
>         ... 3 common frames omitted
> Caused by: java.lang.RuntimeException: Error while applying rule DrillPushProjectIntoScanRule:enumerable,
args [rel#478:LogicalProject.NONE.ANY([]).[](input=RelSubset#477,label=$1), rel#452:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[hdfs_dataset.default,
/home/spark/dataset/default/test2/*.parquet])]
>         at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:235)
>         at org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:633)
>         at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:327)
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:405)
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.transform(DefaultSqlHandler.java:351)
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRawDrel(DefaultSqlHandler.java:245)
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToDrel(DefaultSqlHandler.java:308)
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:173)
>         at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
>         at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
>         at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140)
>         at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
>         at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
>         ... 3 common frames omitted
> Caused by: java.lang.NullPointerException: null
>         at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.checkForPartitionColumn(ParquetGroupScanStatistics.java:186)
>         at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.collect(ParquetGroupScanStatistics.java:119)
>         at org.apache.drill.exec.store.parquet.ParquetGroupScanStatistics.<init>(ParquetGroupScanStatistics.java:59)
>         at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getParquetGroupScanStatistics(BaseParquetMetadataProvider.java:293)
>         at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.getTableMetadata(BaseParquetMetadataProvider.java:249)
>         at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.initializeMetadata(BaseParquetMetadataProvider.java:203)
>         at org.apache.drill.exec.store.parquet.BaseParquetMetadataProvider.init(BaseParquetMetadataProvider.java:170)
>         at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:95)
>         at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl.<init>(ParquetTableMetadataProviderImpl.java:48)
>         at org.apache.drill.exec.metastore.store.parquet.ParquetTableMetadataProviderImpl$Builder.build(ParquetTableMetadataProviderImpl.java:415)
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:150)
>         at org.apache.drill.exec.store.parquet.ParquetGroupScan.<init>(ParquetGroupScan.java:120)
>         at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:202)
>         at org.apache.drill.exec.store.parquet.ParquetFormatPlugin.getGroupScan(ParquetFormatPlugin.java:79)
>         at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:226)
>         at org.apache.drill.exec.store.dfs.FileSystemPlugin.getPhysicalScan(FileSystemPlugin.java:209)
>         at org.apache.drill.exec.planner.logical.DrillTable.getGroupScan(DrillTable.java:119)
>         at org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.canPushProjectIntoScan(DrillPushProjectIntoScanRule.java:190)
>         at org.apache.drill.exec.planner.logical.DrillPushProjectIntoScanRule.onMatch(DrillPushProjectIntoScanRule.java:107)
>         at org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:208)
>         ... 16 common frames omitted
> {code}
> It is same like issue https://issues.apache.org/jira/browse/DRILL-7769.
>  I add some log information and found this:
> {code:java}
> TRACE o.a.d.e.s.p.ParquetGroupScanStatistics - check schema path `features`.`values`.`list`.`element`
with major type null
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type:
TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics
- check schema path `label` with major type minor_type: FLOAT8
> mode: REQUIRED
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type:
TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> 2021-05-25 15:39:21,066 [1f535658-f840-9f0e-1a7b-21080514bb7b:foreman] TRACE o.a.d.e.s.p.ParquetGroupScanStatistics
- check schema path `features`.`size` with major type minor_type: INT
> mode: OPTIONAL
>  current partitionColTypeMap = {`features`.`indices`.`list`.`element`=null, `features`.`type`=minor_type:
TINYINT
> mode: REQUIRED
> , `features`.`size`=minor_type: INT
> mode: OPTIONAL
> }
> {code}
> So that there is some condition major type is null, if drill use this code, it will catch
NullPointerException error:
> {code:java}
> TypeProtos.MajorType majorType = columnMetadata != null ? columnMetadata.majorType()
: null; # 121
> !partitionColTypeMap.get(schemaPath).equals(type) # 189
> {code}
> we need to change null to *org.apache.drill.common.types.Types.NULL* to avoid NullPointerException
error



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message