spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuelin Cao <>
Subject Why Parquet Predicate Pushdown doesn't work?
Date Wed, 07 Jan 2015 06:18:58 GMT

       I'm testing parquet file format, and the predicate pushdown is a very useful feature
for us.
       However, it looks like the predicate push down doesn't work after I set    
   sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")        Here is my sql: 
     sqlContext.sql("select adId, adTitle  from ad where groupId=10113000").collect

       Then, I checked the amount of input data on the WEB UI. But the amount of input
data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown flag
on or off.
       I'm not sure, if there is anything that I must do when generating the parquet file
in order to make the predicate pushdown available. (Like ORC file, when creating the ORC file,
I need to explicitly sort the field that will be used for predicate pushdown)
       Anyone have any idea?
       And, anyone knows the internal mechanism for parquet predicate pushdown?
View raw message