I'm testing parquet file format, and the predicate pushdown is a very useful feature for us.
However, it looks like the predicate push down doesn't work after I set
Here is my sql:
sqlContext.sql("select adId, adTitle from ad where groupId=10113000").collect
Then, I checked the amount of input data on the WEB UI. But the amount of input data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown flag on or off.
I'm not sure, if there is anything that I must do when generating the parquet file in order to make the predicate pushdown available. (Like ORC file, when creating the ORC file, I need to explicitly sort the field that will be used for predicate pushdown)
Anyone have any idea?
And, anyone knows the internal mechanism for parquet predicate pushdown?