spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay <jayadeep.jayara...@gmail.com>
Subject Re: Spark 3 + Delta 0.7.0 Hive Metastore Integration Question
Date Sun, 20 Dec 2020 09:49:47 GMT
I think I found the issue, Hive metastore 2.3.6 doesn't have the necessary
support. After upgrading to Hive 3.1.2 I was able to run the select query.


On Sun, 20 Dec 2020 at 12:00, Jay <jayadeep.jayaraman@gmail.com> wrote:

> Thanks Matt.
>
> I have set the two configs in my sparkConfig as below
> val spark =
> SparkSession.builder().appName("QuickstartSQL").config("spark.sql.extensions",
> "io.delta.sql.DeltaSparkSessionExtension").config("spark.sql.catalog.spark_catalog",
> "org.apache.spark.sql.delta.catalog.DeltaCatalog").getOrCreate()
>
> I am using a managed Spark service with Delta on Google Cloud and
> therefore all nodes have delta-core_2.12:0.7.0 in /usr/lib/delta/jars/
>
> I am using a managed Hive metastore version 2.3.6 which is connected to my
> Delta cluster as well as presto cluster. When I am using the normal scala
> API it works without any issues
>
> spark.read.format("delta").load("pathtoTable").show()
> val deltaTable =
> DeltaTable.forPath("gs://jayadeep-etl-platform/first-delta-table")
> deltaTable.as("oldData").merge(merge_df.as("newData"),"oldData.x =
> newData.x").whenMatched.update(Map("y" ->
> col("newData.y"))).whenNotMatched.insert(Map("x" ->
> col("newData.x"))).execute()
>
> When I am issuing the following command it works fine
> scala> spark.sql(s"SELECT * FROM $tableName")
> res2: org.apache.spark.sql.DataFrame = [col1: int]
>
> But when I try to do .show() it returns an error
> scala> spark.sql(s"SELECT * FROM $tableName").show()
> org.apache.spark.sql.AnalysisException: Table does not support reads:
> default.tblname_3;
>
> -Jay
>
>
> On Sun, 20 Dec 2020 at 03:51, Matt Proetsch <mattproetsch@gmail.com>
> wrote:
>
>> Hi Jay,
>>
>> Some things to check:
>>
>> Do you have the following set in your Spark SQL config:
>>
>> "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension"
>>
>> "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
>>
>>
>> Is the JAR for the package delta-core_2.12:0.7.0 available on both your
>> driver and executor classpaths?
>> (More info
>> https://docs.delta.io/latest/quick-start.html#set-up-apache-spark-with-delta-lake
>> )
>>
>> Since you are using non-default metastore version have you set the config
>> for spark.sql.hive.metastore.version
>> (More info
>> https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore
>> )
>>
>> Finally are you able to read/write Delta tables outside of Hive?
>>
>> -Matt
>>
>> On Dec 19, 2020, at 13:03, Jay <jayadeep.jayaraman@gmail.com> wrote:
>>
>> 
>> Hi All -
>>
>> I have currently setup a Spark 3.0.1 cluster with delta version 0.7.0
>> which is connected to an external hive metastore.
>>
>> I run the below set of commands :-
>>
>> val tableName = tblname_2
>> spark.sql(s"CREATE TABLE $tableName(col1 INTEGER) USING delta
>> options(path='GCS_PATH')")
>>
>> *20/12/19 17:30:52 WARN org.apache.spark.sql.hive.HiveExternalCatalog:
>> Couldn't find corresponding Hive SerDe for data source provider delta.
>> Persisting data source table `default`.`tblname_2` into Hive metastore in
>> Spark SQL specific format, which is NOT compatible with Hive.*
>>
>> spark.sql(s"INSERT OVERWRITE $tableName VALUES 5, 6, 7, 8, 9")
>> res51: org.apache.spark.sql.DataFrame = []
>>
>> spark.sql(s"SELECT * FROM $tableName").show()
>>
>> *org.apache.spark.sql.AnalysisException: Table does not support reads:
>> default.tblname_2;                   *
>>
>> I see a warning which is related to integration with Hive Metastore which
>> essentially tells that this table cannot be queried via Hive or Presto
>> which is fine but when I try to read the data from the same spark session I
>> am getting an error. Can someone suggest what can be the problem ?
>>
>>

Mime
View raw message