spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <>
Subject Re: HiveContext standalone => without a Hive metastore
Date Thu, 26 May 2016 12:01:26 GMT
Hi Gerard,

I’ve never had an issue using the HiveContext without a hive-site.xml configured. However,
one issue you may have is if multiple users are starting the HiveContext from the same path,
they’ll all be trying to store the default Derby metastore in the same location. Also, if
you want them to be able to persist permanent table metadata for SparkSQL then you’ll want
to set up a true metastore.

The other thing it could be is Hive dependency collisions from the classpath, but that shouldn’t
be an issue since you said it’s standalone (not a Hadoop distro right?).


From: Gerard Maas <>
Date: Thursday, May 26, 2016 at 5:28 AM
To: spark users <>
Subject: HiveContext standalone => without a Hive metastore


I'm helping some folks setting up an analytics cluster with  Spark.
They want to use the HiveContext to enable the Window functions on DataFrames(*) but they
don't have any Hive installation, nor they need one at the moment (if not necessary for this

When we try to create a Hive context, we get the following error:

> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
       at org.apache.hadoop.hive.ql.session.SessionState.start(

Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive Metastore?

Is there  a way to instantiate a HiveContext for the sake of Window support without an underlying
Hive deployment?

The docs are explicit in saying that that is should be the case: [1]

"To use a HiveContext, you do not need to have an existing Hive setup, and all of the data
sources available to aSQLContext are still available. HiveContext is only packaged separately
to avoid including all of Hive’s dependencies in the default Spark build."

So what is the right way to address this issue? How to instantiate a HiveContext with spark
running on a HDFS cluster without Hive deployed?

Thanks a lot!


(*) The need for a HiveContext to use Window functions is pretty obscure. The only documentation
of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException: Could not
resolve window function 'max'. Note that, using window functions currently requires a HiveContext;"

View raw message