spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoyu Wang <wangxy...@gmail.com>
Subject Why custom parquet format hive table execute "ParquetTableScan" physical plan, not "HiveTableScan"?
Date Fri, 16 Jan 2015 10:09:05 GMT
Hi all!

In the Spark SQL1.2.0.
I create a hive table with custom parquet inputformat and outputformat.
like this :
CREATE TABLE test(
  id string,
  msg string)
CLUSTERED BY (
  id)
SORTED BY (
  id ASC)
INTO 10 BUCKETS
ROW FORMAT SERDE
  '*com.a.MyParquetHiveSerDe*'
STORED AS INPUTFORMAT
  '*com.a.MyParquetInputFormat*'
OUTPUTFORMAT
  '*com.a.MyParquetOutputFormat*';

And the spark shell see the plan of "select * from test" is :

[== Physical Plan ==]
[!OutputFaker [id#5,msg#6]]
[ *ParquetTableScan* [id#12,msg#13], (ParquetRelation
hdfs://hadoop/user/hive/warehouse/test.db/test, Some(Configuration:
core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml),
org.apache.spark.sql.hive.HiveContext@6d15a113, []), []]

*Not HiveTableScan*!!!
*So it dosn't execute my custom inputformat!*
Why? How can it execute my custom inputformat?

Thanks!

Mime
View raw message