spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yana Kadiyska <yana.kadiy...@gmail.com>
Subject Re: Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?
Date Mon, 19 Jan 2015 22:52:11 GMT
If you're talking about filter pushdowns for parquet files this also has to
be turned on explicitly. Try  *spark.sql.parquet.**filterPushdown=true . *It's
off by default

On Mon, Jan 19, 2015 at 3:46 AM, Xiaoyu Wang <wangxy.jd@gmail.com> wrote:

> Yes it works!
> But the filter can't pushdown!!!
>
> If custom parquetinputformat only implement the datasource API?
>
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala
>
> 2015-01-16 21:51 GMT+08:00 Xiaoyu Wang <wangxy.jd@gmail.com>:
>
>> Thanks yana!
>> I will try it!
>>
>> 在 2015年1月16日,20:51,yana <yana.kadiyska@gmail.com> 写道:
>>
>> I think you might need to set
>> spark.sql.hive.convertMetastoreParquet to false if I understand that flag
>> correctly
>>
>> Sent on the new Sprint Network from my Samsung Galaxy S®4.
>>
>>
>> -------- Original message --------
>> From: Xiaoyu Wang
>> Date:01/16/2015 5:09 AM (GMT-05:00)
>> To: user@spark.apache.org
>> Subject: Why custom parquet format hive table execute "ParquetTableScan"
>> physical plan, not "HiveTableScan"?
>>
>> Hi all!
>>
>> In the Spark SQL1.2.0.
>> I create a hive table with custom parquet inputformat and outputformat.
>> like this :
>> CREATE TABLE test(
>>   id string,
>>   msg string)
>> CLUSTERED BY (
>>   id)
>> SORTED BY (
>>   id ASC)
>> INTO 10 BUCKETS
>> ROW FORMAT SERDE
>>   '*com.a.MyParquetHiveSerDe*'
>> STORED AS INPUTFORMAT
>>   '*com.a.MyParquetInputFormat*'
>> OUTPUTFORMAT
>>   '*com.a.MyParquetOutputFormat*';
>>
>> And the spark shell see the plan of "select * from test" is :
>>
>> [== Physical Plan ==]
>> [!OutputFaker [id#5,msg#6]]
>> [ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
>> hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration:
>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
>> yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
>> org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]
>>
>> *Not HiveTableScan*!!!
>> *So it dosn't execute my custom inputformat!*
>> Why? How can it execute my custom inputformat?
>>
>> Thanks!
>>
>>
>>
>

Mime
View raw message