spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <gerard.m...@gmail.com>
Subject Re: HiveContext standalone => without a Hive metastore
Date Thu, 26 May 2016 18:09:31 GMT
Thanks a lot for the advice!.

I found out why the standalone hiveContext would not work:  it was trying
to deploy a derby db and the user had no rights to create the dir where
there db is stored:

Caused by: java.sql.SQLException: Failed to create database 'metastore_db',
see the next exception for details.

       at
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
Source)

       at
org.apache.derby.impl.jdbc.SQLExceptionFactory40.wrapArgsForTransportAcrossDRDA(Unknown
Source)

       ... 129 more

Caused by: java.sql.SQLException: Directory
/usr/share/spark-notebook/metastore_db cannot be created.


Now, the new issue is that we can't start more than 1 context at the same
time. I think we will need to setup a proper metastore.


-kind regards, Gerard.




On Thu, May 26, 2016 at 3:06 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> To use HiveContext witch is basically an sql api within Spark without
> proper hive set up does not make sense. It is a super set of Spark
> SQLContext
>
> In addition simple things like registerTempTable may not work.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 26 May 2016 at 13:01, Silvio Fiorito <silvio.fiorito@granturing.com>
> wrote:
>
>> Hi Gerard,
>>
>>
>>
>> I’ve never had an issue using the HiveContext without a hive-site.xml
>> configured. However, one issue you may have is if multiple users are
>> starting the HiveContext from the same path, they’ll all be trying to store
>> the default Derby metastore in the same location. Also, if you want them to
>> be able to persist permanent table metadata for SparkSQL then you’ll want
>> to set up a true metastore.
>>
>>
>>
>> The other thing it could be is Hive dependency collisions from the
>> classpath, but that shouldn’t be an issue since you said it’s standalone
>> (not a Hadoop distro right?).
>>
>>
>>
>> Thanks,
>>
>> Silvio
>>
>>
>>
>> *From: *Gerard Maas <gerard.maas@gmail.com>
>> *Date: *Thursday, May 26, 2016 at 5:28 AM
>> *To: *spark users <user@spark.apache.org>
>> *Subject: *HiveContext standalone => without a Hive metastore
>>
>>
>>
>> Hi,
>>
>>
>>
>> I'm helping some folks setting up an analytics cluster with  Spark.
>>
>> They want to use the HiveContext to enable the Window functions on
>> DataFrames(*) but they don't have any Hive installation, nor they need one
>> at the moment (if not necessary for this feature)
>>
>>
>>
>> When we try to create a Hive context, we get the following error:
>>
>>
>>
>> > val sqlContext = new org.apache.spark.sql.hive.HiveContext(sparkContext)
>>
>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>
>>        at
>> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>>
>>
>>
>> Is my HiveContext failing b/c it wants to connect to an unconfigured
>>  Hive Metastore?
>>
>>
>>
>> Is there  a way to instantiate a HiveContext for the sake of Window
>> support without an underlying Hive deployment?
>>
>>
>>
>> The docs are explicit in saying that that is should be the case: [1]
>>
>>
>>
>> "To use a HiveContext, you do not need to have an existing Hive setup,
>> and all of the data sources available to aSQLContext are still
>> available. HiveContext is only packaged separately to avoid including
>> all of Hive’s dependencies in the default Spark build."
>>
>>
>>
>> So what is the right way to address this issue? How to instantiate a
>> HiveContext with spark running on a HDFS cluster without Hive deployed?
>>
>>
>>
>>
>>
>> Thanks a lot!
>>
>>
>>
>> -Gerard.
>>
>>
>>
>> (*) The need for a HiveContext to use Window functions is pretty obscure.
>> The only documentation of this seems to be a runtime exception: "org.apache.spark.sql.AnalysisException:
>> Could not resolve window function 'max'. Note that, using window functions
>> currently requires a HiveContext;"
>>
>>
>>
>> [1]
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#getting-started
>>
>
>

Mime
View raw message