spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: HiveContext standalone => without a Hive metastore
Date Thu, 26 May 2016 13:06:19 GMT
To use HiveContext witch is basically an sql api within Spark without
proper hive set up does not make sense. It is a super set of Spark
SQLContext

In addition simple things like registerTempTable may not work.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fiorito@granturing.com>
wrote:

> Hi Gerard,
>
>
>
> I’ve never had an issue using the HiveContext without a hive-site.xml
> configured. However, one issue you may have is if multiple users are
> starting the HiveContext from the same path, they’ll all be trying to store
> the default Derby metastore in the same location. Also, if you want them to
> be able to persist permanent table metadata for SparkSQL then you’ll want
> to set up a true metastore.
>
>
>
> The other thing it could be is Hive dependency collisions from the
> classpath, but that shouldn’t be an issue since you said it’s standalone
> (not a Hadoop distro right?).
>
>
>
> Thanks,
>
> Silvio
>
>
>
> *From: *Gerard Maas <gerard.maas@gmail.com>
> *Date: *Thursday, May 26, 2016 at 5:28 AM
> *To: *spark users <user@spark.apache.org>
> *Subject: *HiveContext standalone => without a Hive metastore
>
>
>
> Hi,
>
>
>
> I'm helping some folks setting up an analytics cluster with  Spark.
>
> They want to use the HiveContext to enable the Window functions on
> DataFrames(*) but they don't have any Hive installation, nor they need one
> at the moment (if not necessary for this feature)
>
>
>
> When we try to create a Hive context, we get the following error:
>
>
>
> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
>
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>
>        at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>
>
>
> Is my HiveContext failing b/c it wants to connect to an unconfigured  Hive
> Metastore?
>
>
>
> Is there  a way to instantiate a HiveContext for the sake of Window
> support without an underlying Hive deployment?
>
>
>
> The docs are explicit in saying that that is should be the case: [1]
>
>
>
> "To use a HiveContext, you do not need to have an existing Hive setup,
> and all of the data sources available to aSQLContext are still available.
> HiveContext is only packaged separately to avoid including all of Hive’s
> dependencies in the default Spark build."
>
>
>
> So what is the right way to address this issue? How to instantiate a
> HiveContext with spark running on a HDFS cluster without Hive deployed?
>
>
>
>
>
> Thanks a lot!
>
>
>
> -Gerard.
>
>
>
> (*) The need for a HiveContext to use Window functions is pretty obscure.
> The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
> Could not resolve window function 'max'. Note that, using window functions
> currently requires a HiveContext;"
>
>
>
> [1]
> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>

Mime
View raw message