Are you running on yarn?

 - If you are running in yarn-client mode, set HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory where your hive-site.xml is located).
 - If you are running in yarn-cluster mode, the easiest thing to do is to add--files=/etc/hive/conf/hive-site.xml (or the path for your hive-site.xml) to your spark-submit script.

On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:
I can recreate tables but what about data. It looks like this is a obvious feature that Spark SQL must be having. People will want to transform tons of data stored in HDFS through Hive from Spark SQL.

Spark programming guide suggests its possible.


Spark SQL also supports reading and writing data stored in Apache Hive.  .... Configuration of Hive is done by placing your hive-site.xml file in conf/.
https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables

For some reason its not working.


On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <arush@sigmoidanalytics.com> wrote:
Seems Spark SQL accesses some more columns apart from those created by hive.

You can always recreate the tables, you would need to execute the table creation scripts but it would be good to avoid recreation.

On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:
I did copy hive-conf.xml form Hive installation into spark-home/conf. IT does have all the meta store connection details, host, username, passwd, driver and others.



Snippet
======


<configuration>

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hiveuser</value>
  <description>username to use against metastore database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>some-password</value>
  <description>password to use against metastore database</description>
</property>

<property>
  <name>hive.metastore.local</name>
  <value>false</value>
  <description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description>
</property>

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>location of default database for the warehouse</description>
</property>

......



When i attempt to read hive table, it does not work. dw_bid does not exists.

I am sure there is a way to read tables stored in HDFS (Hive) from Spark SQL. Otherwise how would anyone do analytics since the source tables are always either persisted directly on HDFS or through Hive.


On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <arush@sigmoidanalytics.com> wrote:
Since hive and spark SQL internally use HDFS and Hive metastore. The only thing you want to change is the processing engine. You can try to bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive site xml captures the metastore connection details). 

Its a hack,  i havnt tried it. I have played around with the metastore and it should work.

On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:
I have few tables that are created in Hive. I wan to transform data stored in these Hive tables using Spark SQL. Is this even possible ?

So far i have seen that i can create new tables using Spark SQL dialect. However when i run show tables or do desc hive_table it says table not found.

I am now wondering is this support present or not in Spark SQL ?

--
Deepak




--

Sigmoid Analytics

Arush Kharbanda || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com




--
Deepak




--

Sigmoid Analytics

Arush Kharbanda || Technical Teamlead

arush@sigmoidanalytics.com || www.sigmoidanalytics.com




--
Deepak