spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xuelin Cao <xuelin...@yahoo.com.INVALID>
Subject Why Parquet Predicate Pushdown doesn't work?
Date Wed, 07 Jan 2015 06:18:58 GMT

Hi,
       I'm testing parquet file format, and the predicate pushdown is a very useful feature
for us.
       However, it looks like the predicate push down doesn't work after I set    
   sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")        Here is my sql: 
     sqlContext.sql("select adId, adTitle  from ad where groupId=10113000").collect

       Then, I checked the amount of input data on the WEB UI. But the amount of input
data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown flag
on or off.
       I'm not sure, if there is anything that I must do when generating the parquet file
in order to make the predicate pushdown available. (Like ORC file, when creating the ORC file,
I need to explicitly sort the field that will be used for predicate pushdown)
       Anyone have any idea?
       And, anyone knows the internal mechanism for parquet predicate pushdown?
       Thanks
 
Mime
View raw message