spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivaram Venkataraman <shiva...@eecs.berkeley.edu>
Subject Re: SparkR Reading Tables from Hive
Date Mon, 08 Jun 2015 20:59:41 GMT
Thanks for the confirmation - I was just going to send a pointer to the
documentation that talks about hive-site.xml.
http://people.apache.org/~pwendell/spark-releases/latest/sql-programming-guide.html#hive-tables

Thanks
Shivaram

On Mon, Jun 8, 2015 at 1:57 PM, Eskilson,Aleksander <
Alek.Eskilson@cerner.com> wrote:

>  Resolved, my hive-site.xml wasn’t in the conf folder. I can load tables
> into DataFrames as expected.
>
>  Thanks,
> Alek
>
>   From: <Eskilson>, Aleksander Eskilson <Alek.Eskilson@cerner.com>
> Date: Monday, June 8, 2015 at 3:38 PM
> To: "dev@spark.apache.org" <dev@spark.apache.org>
> Subject: SparkR Reading Tables from Hive
>
>   Hi there,
>
>  I’m testing out the new SparkR-Hive interop right now. I’m noticing an
> apparent disconnect between the Hive store I have my data loaded and the
> store that sparkRHIve.init() connects to. For example, in beeline:
>
>  0: jdbc:hive2://quickstart.cloudera:10000> show databases;
>  +---------------+--+
>  | database_name |
>  +---------------+--+
>  | default       |
>  +---------------+--+
>  0: jdbc:hive2://quickstart.cloudera:10000> show tables;
>  +---------------+--+
> | tab_name      |
> +---------------+--+
> | my_table      |
> +---------------+--+
>
>  But in sparkR:
>
>  > hqlContext <- sparkRHive.init(sc)
>  > showDF(sql(hqlContext, “show databases”))
>  +---------+
>  | result  |
>  +---------+
>  | default |
>  +---------+
> > showDF(tables(hqlContext, “default”))
> +-----------+-------------+
> + tableName | isTemporary |
> +-----------+-------------+
> +-----------+-------------+
> > showDF(sql(hqlContext, “show tables”))
>  +-----------+-------------+
> + tableName | isTemporary |
> +-----------+-------------+
> +-----------+-------------+
>
>  The data in my_table was landed into Hive from a CSV via kite-dataset.
> The installation of Spark I’m working with was built separately, and
> operates as standalone. Could it be that sparkRHive.init() is getting the
> wrong address of the Hive metastore? How could I peer into the context and
> see what the address is set to, and if it’s wrong, reset it?
>
>  Ultimately, I’d like to be able to read my_table from Hive into a SparkR
> DataFrame which ought to be possible with
> > result <- sql(hqlContext, “SELECT * FROM my_table”)
> But this fails with:
> org.apache.spark.sql.AnalysisException: no such table my_table; line 1 pos
> 14
> which is expected, I suppose, since we don’t see the table in the listing
> above.
>
>  Any thoughts?
>
>  Thanks,
> Alek Eskilson
> CONFIDENTIALITY NOTICE This message and any included attachments are from
> Cerner Corporation and are intended only for the addressee. The information
> contained in this message is confidential and may constitute inside or
> non-public information under international, federal, or state securities
> laws. Unauthorized forwarding, printing, copying, distribution, or use of
> such information is strictly prohibited and may be unlawful. If you are not
> the addressee, please promptly delete this message and notify the sender of
> the delivery error by e-mail or you may call Cerner's corporate offices in
> Kansas City, Missouri, U.S.A at (+1) (816)221-1024.
>

Mime
View raw message