spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Khazma <>
Subject [Spark 3.0] DataSourceV2 FileScan - Hive style partition pruning
Date Mon, 30 Dec 2019 14:41:08 GMT

It seems that hive style partition pruning is not working for file based
data sources such as Parquet and ORC.
This causes serious performance degradation for non hive tables.

The reason for that is that the  FileScan
abstract class is not aware of the partition and data filters. 
The method for getting the selectedPartitions calls the FileIndex listFiles
method with empty sequence for both - see  here


In the v1 datasource the  FileSourceScanExec
class gets the partition and data filters and use them to filter unnecessary
partitions by passing them to the listFiles function - see  here


Are there any ongoing plans to add a support for that?


Sent from:

To unsubscribe e-mail:

View raw message