spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Song <chen.song...@gmail.com>
Subject Re: Spark SQL and Hive tables
Date Tue, 30 Sep 2014 15:56:11 GMT
I have ran into the same issue. I understand with the new assembly built
with -Phive, I can run a spark job in yarn-cluster mode. But is there a way
for me to run spark-shell with support of hive?

I tried to add the new assembly jar with --driver-library-path
and --driver-class-path but neither works. I kept seeing the same exception.

object hive is not a member of package org.apache.spark.sql
       val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)

On Fri, Jul 25, 2014 at 6:25 PM, sstilak <sstilak@live.com> wrote:

>  Thanks!  Will do.
>
>
>  Sent via the Samsung GALAXY S®4, an AT&T 4G LTE smartphone
>
>
> -------- Original message --------
> From: Michael Armbrust
> Date:07/25/2014 3:24 PM (GMT-08:00)
> To: user@spark.apache.org
> Subject: Re: Spark SQL and Hive tables
>
>   [S]ince Hive has a large number of dependencies, it is not included in
> the default Spark assembly. In order to use Hive you must first run ‘SPARK_HIVE=true
> sbt/sbt assembly/assembly’ (or use -Phive for maven). This command builds
> a new assembly jar that includes Hive. Note that this Hive assembly jar
> must also be present on all of the worker nodes, as they will need access
> to the Hive serialization and deserialization libraries (SerDes) in order
> to acccess data stored in Hive.
>
>
>
> On Fri, Jul 25, 2014 at 3:20 PM, Sameer Tilak <sstilak@live.com> wrote:
>
>  Hi Jerry,
>
>  I am having trouble with this. May be something wrong with my import or
> version etc.
>
>  scala> import org.apache.spark.sql._;
> import org.apache.spark.sql._
>
>  scala> val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
> <console>:24: error: object hive is not a member of package
> org.apache.spark.sql
>        val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>                                                   ^
> Here is what I see for autocompletion:
>
>  scala> org.apache.spark.sql.
> Row             SQLContext      SchemaRDD       SchemaRDDLike   api
> catalyst        columnar        execution       package         parquet
> test
>
>
>  ------------------------------
> Date: Fri, 25 Jul 2014 17:48:27 -0400
>
> Subject: Re: Spark SQL and Hive tables
>  From: chilinglam@gmail.com
> To: user@spark.apache.org
>
>
> Hi Sameer,
>
>  The blog post you referred to is about Spark SQL. I don't think the
> intent of the article is meant to guide you how to read data from Hive via
> Spark SQL. So don't worry too much about the blog post.
>
>  The programming guide I referred to demonstrate how to read data from
> Hive using Spark SQL. It is a good starting point.
>
>  Best Regards,
>
>  Jerry
>
>
> On Fri, Jul 25, 2014 at 5:38 PM, Sameer Tilak <sstilak@live.com> wrote:
>
>  Hi Michael,
> Thanks. I am not creating HiveContext, I am creating SQLContext. I am
> using CDH 5.1. Can you please let me know which conf/ directory you are
> talking about?
>
>  ------------------------------
> From: michael@databricks.com
> Date: Fri, 25 Jul 2014 14:34:53 -0700
>
> Subject: Re: Spark SQL and Hive tables
>  To: user@spark.apache.org
>
>
> In particular, have you put your hive-site.xml in the conf/ directory?
> Also, are you creating a HiveContext instead of a SQLContext?
>
>
> On Fri, Jul 25, 2014 at 2:27 PM, Jerry Lam <chilinglam@gmail.com> wrote:
>
> Hi Sameer,
>
>  Maybe this page will help you:
> https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
>
>  Best Regards,
>
>  Jerry
>
>
>
> On Fri, Jul 25, 2014 at 5:25 PM, Sameer Tilak <sstilak@live.com> wrote:
>
>  Hi All,
> I am trying to load data from Hive tables using Spark SQL. I am using
> spark-shell. Here is what I see:
>
>  val trainingDataTable = sql("""SELECT prod.prod_num,
> demographics.gender, demographics.birth_year, demographics.income_group
>  FROM prod p JOIN demographics d ON d.user_id = p.user_id""")
>
>  14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> MultiInstanceRelations
> 14/07/25 14:18:46 INFO Analyzer: Max iterations (2) reached for batch
> CaseInsensitiveAttributeReferences
> java.lang.RuntimeException: Table Not Found: prod.
>
>  I have these tables in hive. I used show tables command to confirm this.
> Can someone please let me know how do I make them accessible here?
>
>
>
>
>
>


-- 
Chen Song

Mime
View raw message