spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mithila Joshi <joshi.mith...@gmail.com>
Subject Fail to load hive tables through Spark
Date Thu, 23 Jul 2015 20:20:28 GMT
I am new to Spark and needed help in figuring out why my Hive databases are
not accessible to perform a data load through Spark.

Background:

   1.

   I am running Hive, Spark, and my Java program on a single machine. It's
   a Cloudera QuickStart VM, CDH5.4x, on a VirtualBox.
   2.

   I have downloaded pre-built Spark 1.3.1.
   3.

   I am using the Hive bundled with the VM and can run hive queries through
   Spark-shell and Hive cmd line without any issue. This includes running the
   command:

 LOAD DATA INPATH
'hdfs://quickstart.cloudera:8020/user/cloudera/test_table/result.parquet/'
INTO TABLE test_spark.test_table PARTITION(part = '2015-08-21');

Problem:

I am writing a Java program to read data from Cassandra and load it into
Hive. I have saved the results of the Cassandra read in parquet format in a
folder called 'result.parquet'.

Now I would like to load this into Hive. For this, I

   1.

   Copied the Hive-site.xml to the Spark conf folder.
   - I made a change to this xml. I noticed that I had two hive-site.xml -
      one which was auto generated and another which had Hive execution
      parameters. I combined both into a single hive-site.xml.
   2.

   Code used (Java):

   HiveContext hiveContext = new
     HiveContext(JavaSparkContext.toSparkContext(sc));
     hiveContext.sql("show databases").show();
     hiveContext.sql("LOAD DATA INPATH
     'hdfs://quickstart.cloudera:8020/user/cloudera/test_table/result.parquet/'
     INTO TABLE test_spark.test_table PARTITION(part = '2015-08-21')").show();


So, this worked. And I could load data into Hive. Except, after I restarted
my VM, it has stopped working.

When I run the show databases Hive query, I get a result saying

result
default

instead of the databases in Hive, which are

default
test_spark

I also notice a folder called metastore_db being created in my Project
Folder. From googling around, I know this happens when Spark can't connect
to the Hive metastore, so it creates one of its own.I thought I had fixed
that, but clearly not.

What am I doing wrong?


Best,

Mithila

Mime
View raw message