But Xuelin already posted in the original message that the code was using

SET spark.sql.parquet.filterPushdown=true

On Wed, Jan 7, 2015 at 12:42 AM, Daniel Haviv <danielrulez@gmail.com> wrote:
Quoting Michael:
Predicate push down into the input format is turned off by default because there is a bug in the current parquet library that null pointers when there are full row groups that are null.


On 7 בינו׳ 2015, at 08:18, Xuelin Cao <xuelincao@yahoo.com.INVALID> wrote:


       I'm testing parquet file format, and the predicate pushdown is a very useful feature for us.

       However, it looks like the predicate push down doesn't work after I set 
       sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")
       Here is my sql:
       sqlContext.sql("select adId, adTitle  from ad where groupId=10113000").collect

       Then, I checked the amount of input data on the WEB UI. But the amount of input data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown flag on or off.

       I'm not sure, if there is anything that I must do when generating the parquet file in order to make the predicate pushdown available. (Like ORC file, when creating the ORC file, I need to explicitly sort the field that will be used for predicate pushdown)

       Anyone have any idea?

       And, anyone knows the internal mechanism for parquet predicate pushdown?