spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravi Shankar <>
Subject Re: Spark hive build and connectivity
Date Thu, 22 Oct 2020 19:54:54 GMT
Thanks ! I have a very similar setup. I have built spark with -Phive which
includes hive-2.3.7 jars , spark-hive*jars and some hadoop-common* jars.

At runtime, i set SPARK_DIST_CLASSPATH=${hadoop classpath}

and set spark.sql.hive.metastore.version and spark.sql.hive.metastore.jars
to $HIVE_HOME/lib/*.

With this , I am able to read and write to hive successfully from my spark
jobs. So my question and doubt is the same as yours - is it just working by
chance ? How and when does spark use the hive-2.3.7* jars  as opposed to
the metastore jars ?

What if my hive tables uses some serdes and functions in my hive 3.x
cluster ? How will spark be able to use them at runtime ? Hope someone has
a clear understanding of how spark works with hive.

On Thu, Oct 22, 2020 at 12:48 PM Kimahriman <> wrote:

> I have always been a little confused about the different hive-version
> integration as well. To expand on this question, we have a Hive 3.1.1
> metastore that we can successfully interact with using the -Phive profile
> with Hive 2.3.7. We do not use the Hive 3.1.1 jars anywhere in our Spark
> applications. Are we just lucky that the 2.3.7 jars are compatible for our
> use cases with the 3.1.1 metastore? Or are the
> `spark.sql.hive.metastore.jars` only used if you are using a direct JDBC
> connection and acting as the metastore?
> Also FWIW, the documentation only claims compatibility up to Hive version
> 3.1.2. Not sure if there's any breaking changes in 3.2 and beyond.
> --
> Sent from:
> ---------------------------------------------------------------------
> To unsubscribe e-mail:

View raw message