When I create an external table in hive based on a parquet file in Spark 2.0.0, I am running into an error that causes querying this table returns all nulls. I believe it is because Spark SQL is using its own Parquet support instead of the Hive SerDe and there is potentially a mismatch (however I have checked many times and cannot find anything. If I set spark.sql.hive.convertMetastoreParquet to false, I am able to query the table and get the results. However, this has the side effect of not being able to create Hive tables based on parquet files. It appears to be related to this thread (https://www.mail-archive.com/user@spark.apache.org/msg55305.html). Note that it does not appear to happen to every external table that I create.


Is this a bug or is it intentional?


Here is my workflow:

Create an external table

Querying the external table returns null but querying the parquet file is fine

Note that the tables have the same number of rows

If I set convertMetastoreParquet to false, I can query the external table


However if I try to then create a table using CTAS, it fails with an ‘alter_table_with_cascade’ error:




Anton Bubna-Litic
Level 25, 8-12 Chifley Square
Sydney NSW 2000

T: +61 2 8222 3585

W: quantium.com.au


The contents of this email, including attachments, may be confidential information. If you are not the intended recipient, any use, disclosure or copying of the information is unauthorised. If you have received this email in error, we would be grateful if you would notify us immediately by email reply, phone (+ 61 2 9292 6400) or fax (+ 61 2 9292 6444) and delete the message from your system.