spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heng Su <permanent.s...@gmail.com>
Subject Datasource v2 can not prune file source partitions when readDataSchema is empty
Date Tue, 14 Sep 2021 06:00:15 GMT
Hi, community:

We use spark 3.1.2

In PruneFileSourcePartitions rule, the FileScan::withFilters is called to push partition prune
filter(and this is the only place this function can be called), but it has a constraint that
“scan.readDataSchema.nonEmpty” (https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114)

We use spark sql in custom catalog and execute the count sql like:   select count(*) from
catalog.db.tbl where dt=‘0812’ ,  in which dt is a partition key.

In this case the scan.readDataSchema is empty indeed and no scan partition prune performed,
 which cause scan all partition at last.

Is it something I misunderstood? Any help is appreciated

Than you.



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message