spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cheng Lian <lian.cs....@gmail.com>
Subject Re: Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?
Date Tue, 20 Jan 2015 19:35:04 GMT
|spark.sql.parquet.filterPushdown| defaults to |false| because there’s a 
bug in Parquet which may cause NPE, please refer to 
http://spark.apache.org/docs/latest/sql-programming-guide.html#configuration

This bug hasn’t been fixed in Parquet master. We’ll turn this on once 
the bug is fixed.

Cheng

On 1/19/15 5:02 PM, Xiaoyu Wang wrote:

> The *spark.sql.parquet.**filterPushdown=true *has been turned on. But 
> set *spark.sql.hive.**convertMetastoreParquet *to *false*. the first 
> parameter is lose efficacy!!!
>
> 2015-01-20 6:52 GMT+08:00 Yana Kadiyska <yana.kadiyska@gmail.com 
> <mailto:yana.kadiyska@gmail.com>>:
>
>     If you're talking about filter pushdowns for parquet files this
>     also has to be turned on explicitly. Try
>     *spark.sql.parquet.**filterPushdown=true . *It's off by default
>
>     On Mon, Jan 19, 2015 at 3:46 AM, Xiaoyu Wang <wangxy.jd@gmail.com
>     <mailto:wangxy.jd@gmail.com>> wrote:
>
>         Yes it works!
>         But the filter can't pushdown!!!
>
>         If custom parquetinputformat only implement the datasource API?
>
>         https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
>
>         2015-01-16 21:51 GMT+08:00 Xiaoyu Wang <wangxy.jd@gmail.com
>         <mailto:wangxy.jd@gmail.com>>:
>
>             Thanks yana!
>             I will try it!
>
>>             在 2015年1月16日,20:51,yana <yana.kadiyska@gmail.com
>>             <mailto:yana.kadiyska@gmail.com>> 写道:
>>
>>             I think you might need to set
>>             spark.sql.hive.convertMetastoreParquet to false if I
>>             understand that flag correctly
>>
>>             Sent on the new Sprint Network from my Samsung Galaxy S®4.
>>
>>
>>             -------- Original message --------
>>             From: Xiaoyu Wang
>>             Date:01/16/2015 5:09 AM (GMT-05:00)
>>             To: user@spark.apache.org <mailto:user@spark.apache.org>
>>             Subject: Why custom parquet format hive table execute
>>             "ParquetTableScan" physical plan, not "HiveTableScan"?
>>
>>             Hi all!
>>
>>             In the Spark SQL1.2.0.
>>             I create a hive table with custom parquet inputformat and
>>             outputformat.
>>             like this :
>>             CREATE TABLE test(
>>               id string,
>>               msg string)
>>             CLUSTERED BY (
>>               id)
>>             SORTED BY (
>>               id ASC)
>>             INTO 10 BUCKETS
>>             ROW FORMAT SERDE
>>               '*com.a.MyParquetHiveSerDe*'
>>             STORED AS INPUTFORMAT
>>               '*com.a.MyParquetInputFormat*'
>>             OUTPUTFORMAT
>>               '*com.a.MyParquetOutputFormat*';
>>
>>             And the spark shell see the plan of "select * from test" is :
>>
>>             [== Physical Plan ==]
>>             [!OutputFaker [id#5,msg#6]]
>>             [ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
>>             hdfs://hadoop/user/hive/warehouse/test.db/test,
>>             Some(Configuration: core-default.xml, core-site.xml,
>>             mapred-default.xml, mapred-site.xml, yarn-default.xml,
>>             yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
>>             org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]
>>
>>             *Not HiveTableScan*!!!
>>             *So it dosn't execute my custom inputformat!*
>>             Why? How can it execute my custom inputformat?
>>
>>             Thanks!
>
>
>
>
​

Mime
View raw message