spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Proetsch <mattproet...@gmail.com>
Subject Re: Spark 3 + Delta 0.7.0 Hive Metastore Integration Question
Date Sat, 19 Dec 2020 22:21:39 GMT
Hi Jay,

Some things to check:

Do you have the following set in your Spark SQL config:

"spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"
"spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"

Is the JAR for the package delta-core_2.12:0.7.0 available on both your driver and executor
classpaths?
(More info https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake)

Since you are using non-default metastore version have you set the config for spark.sql.hive.metastore.version
(More info https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore)

Finally are you able to read/write Delta tables outside of Hive?

-Matt

> On Dec 19, 2020, at 13:03, Jay <jayadeep.jayaraman@gmail.com> wrote:
> 
> Hi All -
> 
> I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0 which is connected
to an external hive metastore.
> 
> I run the below set of commands :-
> 
> val tableName = tblname_2
> spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta options(path='GCS_PATH')")
> 20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog: Couldn't find corresponding
Hive SerDe for data source provider delta. Persisting data source table `default`.`tblname_2`
into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
> 
> spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9")
> res51: org.apache.spark.sql.DataFrame = []      
> 
> spark.sql(s"SELECT * FROM $tableName").show()
> org.apache.spark.sql.AnalysisException: Table does not support reads: default.tblname_2;
                  
> 
> I see a warning which is related to integration with Hive Metastore which essentially
tells that this table cannot be queried via Hive or Presto which is fine but when I try to
read the data from the same spark session I am getting an error. Can someone suggest what
can be the problem ?

Mime
View raw message