spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yana <yana.kadiy...@gmail.com>
Subject RE: Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?
Date Fri, 16 Jan 2015 12:51:35 GMT
I think you might need to set 
spark.sql.hive.convertMetastoreParquet to false if I understand that flag correctly

Sent on the new Sprint Network from my Samsung Galaxy S®4.

<div>-------- Original message --------</div><div>From: Xiaoyu Wang <wangxy.jd@gmail.com>
</div><div>Date:01/16/2015  5:09 AM  (GMT-05:00) </div><div>To: user@spark.apache.org
</div><div>Subject: Why custom parquet format hive table execute "ParquetTableScan"
physical plan, not "HiveTableScan"? </div><div>
</div>Hi all!

In the Spark SQL1.2.0.
I create a hive table with custom parquet inputformat and outputformat.
like this :
CREATE TABLE test(
  id string, 
  msg string)
CLUSTERED BY ( 
  id) 
SORTED BY ( 
  id ASC) 
INTO 10 BUCKETS
ROW FORMAT SERDE
  'com.a.MyParquetHiveSerDe'
STORED AS INPUTFORMAT 
  'com.a.MyParquetInputFormat' 
OUTPUTFORMAT 
  'com.a.MyParquetOutputFormat';

And the spark shell see the plan of "select * from test" is :

[== Physical Plan ==]
[!OutputFaker [id#5,msg#6]]
[ ParquetTableScan [id#12,msg#13], (ParquetRelation hdfs://hadoop/user/hive/warehouse/test.db/test,
Some(Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml), org.apache.spark.sql.hive.HiveContext@6d15a113,
[]), []]

Not HiveTableScan!!!
So it dosn't execute my custom inputformat!
Why? How can it execute my custom inputformat?

Thanks!
Mime
View raw message