The "mysql" command line doesn't use JDBC to talk to MySQL server, so this doesn't verify anything.
I think this Hive metastore installation guide from Cloudera may be helpful. Although this document is for CDH4, the general steps are the same, and should help you to figure out the relationships here.
On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
I am able to connect to MySQL Hive metastore from the client cluster machine.
-sh-4.1$ mysql --user=hiveuser --password=pass --host=hostname.vip.company.comWelcome to the MySQL monitor. Commands end with ; or \g.Your MySQL connection id is 9417286Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respectiveowners.Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> use eBayHDB;
Reading table information for completion of table and column namesYou can turn off this feature to get a quicker startup with -A
Database changedmysql> show tables;+---------------------------+| Tables_in_HDB |
On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <firstname.lastname@example.org> wrote:
Yes am using yarn-cluster and i did add it via --files. I get "Suitable error not found error"
Please share the spark-submit command that shows mysql jar containing driver class used to connect to Hive MySQL meta store.
Even after including it through
OR (AND)--jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
I keep getting "Suitable driver not found for"
./bin/spark-submit -v --master yarn-cluster --driver-class-path /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-184.108.40.206-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar --jars /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar --files $SPARK_HOME/conf/hive-site.xml --num-executors 1 --driver-memory 4g --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16 input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2Logs====
Caused by: java.sql.SQLException: No suitable driver found for jdbc:mysql://hostname:3306/HDBat java.sql.DriverManager.getConnection(DriverManager.java:596)at java.sql.DriverManager.getConnection(DriverManager.java:187)at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)... 68 more......
15/03/27 23:56:08 INFO yarn.Client: Uploading resource file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar -> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
-sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver61 Fri Oct 17 08:05:36 GMT-07:00 2014 META-INF/services/java.sql.Driver3396 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/fabric/jdbc/FabricMySQLDriver.class692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class1562 Fri Oct 17 08:05:20 GMT-07:00 2014 com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class17817 Fri Oct 17 08:05:20 GMT-07:00 2014 com/mysql/jdbc/NonRegisteringDriver.class690 Fri Oct 17 08:05:24 GMT-07:00 2014 com/mysql/jdbc/NonRegisteringReplicationDriver.class731 Fri Oct 17 08:05:24 GMT-07:00 2014 com/mysql/jdbc/ReplicationDriver.class336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.classYou have new mail in /var/spool/mail/dvasthimal-sh-4.1$ cat conf/hive-site.xml | grep Driver<name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description>-sh-4.1$--
On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <email@example.com> wrote:
Are you running on yarn?- If you are running in yarn-client mode, set HADOOP_CONF_DIR to /etc/hive/conf/ (or the directory where your hive-site.xml is located).
- If you are running in yarn-cluster mode, the easiest thing to do is to add--files=/etc/hive/conf/hive-site.xml (or the path for your hive-site.xml) to your spark-submit script.
On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <firstname.lastname@example.org> wrote:
I can recreate tables but what about data. It looks like this is a obvious feature that Spark SQL must be having. People will want to transform tons of data stored in HDFS through Hive from Spark SQL.
Spark programming guide suggests its possible.
Spark SQL also supports reading and writing data stored in Apache Hive. .... Configuration of Hive is done by placing your
For some reason its not working.
On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <email@example.com> wrote:
Seems Spark SQL accesses some more columns apart from those created by hive.You can always recreate the tables, you would need to execute the table creation scripts but it would be good to avoid recreation.
On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <firstname.lastname@example.org> wrote:
I did copy hive-conf.xml form Hive installation into spark-home/conf. IT does have all the meta store connection details, host, username, passwd, driver and others.
<property><name>javax.jdo.option.ConnectionDriverName</name><value>com.mysql.jdbc.Driver</value><description>Driver class name for a JDBC metastore</description></property>
<property><name>javax.jdo.option.ConnectionUserName</name><value>hiveuser</value><description>username to use against metastore database</description></property>
<property><name>javax.jdo.option.ConnectionPassword</name><value>some-password</value><description>password to use against metastore database</description></property>
<property><name>hive.metastore.local</name><value>false</value><description>controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM</description></property>
<property><name>hive.metastore.warehouse.dir</name><value>/user/hive/warehouse</value><description>location of default database for the warehouse</description></property>
When i attempt to read hive table, it does not work. dw_bid does not exists.
I am sure there is a way to read tables stored in HDFS (Hive) from Spark SQL. Otherwise how would anyone do analytics since the source tables are always either persisted directly on HDFS or through Hive.
On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <email@example.com> wrote:
Since hive and spark SQL internally use HDFS and Hive metastore. The only thing you want to change is the processing engine. You can try to bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that the hive site xml captures the metastore connection details).
Its a hack, i havnt tried it. I have played around with the metastore and it should work.--
On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <firstname.lastname@example.org> wrote:
I have few tables that are created in Hive. I wan to transform data stored in these Hive tables using Spark SQL. Is this even possible ?
So far i have seen that i can create new tables using Spark SQL dialect. However when i run show tables or do desc hive_table it says table not found.
I am now wondering is this support present or not in Spark SQL ?