spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cody Koeninger <c...@koeninger.org>
Subject Re: Why Parquet Predicate Pushdown doesn't work?
Date Wed, 07 Jan 2015 14:14:54 GMT
But Xuelin already posted in the original message that the code was using

SET spark.sql.parquet.filterPushdown=true

On Wed, Jan 7, 2015 at 12:42 AM, Daniel Haviv <danielrulez@gmail.com> wrote:

> Quoting Michael:
> Predicate push down into the input format is turned off by default because
> there is a bug in the current parquet library that null pointers when there
> are full row groups that are null.
>
> https://issues.apache.org/jira/browse/SPARK-4258
>
> You can turn it on if you want:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration
>
> Daniel
>
> On 7 בינו׳ 2015, at 08:18, Xuelin Cao <xuelincao@yahoo.com.INVALID> wrote:
>
>
> Hi,
>
>        I'm testing parquet file format, and the predicate pushdown is a
> very useful feature for us.
>
>        However, it looks like the predicate push down doesn't work after I
> set
>        sqlContext.sql("SET spark.sql.parquet.filterPushdown=true")
>
>        Here is my sql:
>        *sqlContext.sql("select adId, adTitle  from ad where
> groupId=10113000").collect*
>
>        Then, I checked the amount of input data on the WEB UI. But the
> amount of input data is ALWAYS 80.2M regardless whether I turn the spark.sql.parquet.filterPushdown
> flag on or off.
>
>        I'm not sure, if there is anything that I must do when *generating
> *the parquet file in order to make the predicate pushdown available.
> (Like ORC file, when creating the ORC file, I need to explicitly sort the
> field that will be used for predicate pushdown)
>
>        Anyone have any idea?
>
>        And, anyone knows the internal mechanism for parquet predicate
> pushdown?
>
>        Thanks
>
>
>
>

Mime
View raw message