spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Haviv <>
Subject Re: Why Parquet Predicate Pushdown doesn't work?
Date Wed, 07 Jan 2015 06:42:30 GMT
Quoting Michael:
Predicate push down into the input format is turned off by default because there is a bug
in the current parquet library that null pointers when there are full row groups that are

You can turn it on if you want:


> On 7 בינו׳ 2015, at 08:18, Xuelin Cao <> wrote:
> Hi,
>        I'm testing parquet file format, and the predicate pushdown is a very useful feature
for us.
>        However, it looks like the predicate push down doesn't work after I set 
>        sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")
>        Here is my sql:
>        sqlContext.sql("select adId, adTitle  from ad where groupId=10113000").collect
>        Then, I checked the amount of input data on the WEB UI. But the amount of input
data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown flag on
or off.
>        I'm not sure, if there is anything that I must do when generating the parquet
file in order to make the predicate pushdown available. (Like ORC file, when creating the
ORC file, I need to explicitly sort the field that will be used for predicate pushdown)
>        Anyone have any idea?
>        And, anyone knows the internal mechanism for parquet predicate pushdown?
>        Thanks

View raw message