spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoyu Wang <wangxy...@gmail.com>
Subject Re: Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?
Date Tue, 20 Jan 2015 01:02:51 GMT
The *spark.sql.parquet.**filterPushdown=true *has been turned on. But set
*spark.sql.hive.**convertMetastoreParquet *to *false*. the first parameter
is lose efficacy!!!

2015-01-20 6:52 GMT+08:00 Yana Kadiyska <yana.kadiyska@gmail.com>:

> If you're talking about filter pushdowns for parquet files this also has
> to be turned on explicitly. Try  *spark.sql.parquet.**filterPushdown=true
> . *It's off by default
>
> On Mon, Jan 19, 2015 at 3:46 AM, Xiaoyu Wang <wangxy.jd@gmail.com> wrote:
>
>> Yes it works!
>> But the filter can't pushdown!!!
>>
>> If custom parquetinputformat only implement the datasource API?
>>
>>
>> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
>>
>> 2015-01-16 21:51 GMT+08:00 Xiaoyu Wang <wangxy.jd@gmail.com>:
>>
>>> Thanks yana!
>>> I will try it!
>>>
>>> 在 2015年1月16日,20:51,yana <yana.kadiyska@gmail.com> 写道:
>>>
>>> I think you might need to set
>>> spark.sql.hive.convertMetastoreParquet to false if I understand that
>>> flag correctly
>>>
>>> Sent on the new Sprint Network from my Samsung Galaxy S®4.
>>>
>>>
>>> -------- Original message --------
>>> From: Xiaoyu Wang
>>> Date:01/16/2015 5:09 AM (GMT-05:00)
>>> To: user@spark.apache.org
>>> Subject: Why custom parquet format hive table execute "ParquetTableScan"
>>> physical plan, not "HiveTableScan"?
>>>
>>> Hi all!
>>>
>>> In the Spark SQL1.2.0.
>>> I create a hive table with custom parquet inputformat and outputformat.
>>> like this :
>>> CREATE TABLE test(
>>>   id string,
>>>   msg string)
>>> CLUSTERED BY (
>>>   id)
>>> SORTED BY (
>>>   id ASC)
>>> INTO 10 BUCKETS
>>> ROW FORMAT SERDE
>>>   '*com.a.MyParquetHiveSerDe*'
>>> STORED AS INPUTFORMAT
>>>   '*com.a.MyParquetInputFormat*'
>>> OUTPUTFORMAT
>>>   '*com.a.MyParquetOutputFormat*';
>>>
>>> And the spark shell see the plan of "select * from test" is :
>>>
>>> [== Physical Plan ==]
>>> [!OutputFaker [id#5,msg#6]]
>>> [ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
>>> hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration:
>>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
>>> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
>>> org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]
>>>
>>> *Not HiveTableScan*!!!
>>> *So it dosn't execute my custom inputformat!*
>>> Why? How can it execute my custom inputformat?
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>

Mime
View raw message