spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ÐΞ€ρ@Ҝ (๏̯͡๏) <deepuj...@gmail.com>
Subject Re: Can spark sql read existing tables created in hive
Date Tue, 31 Mar 2015 04:16:01 GMT
I have raised a JIRA - https://issues.apache.org/jira/browse/SPARK-6622 .
In order to track this issue and possibly if it requires a fix from Spark

On Tue, Mar 31, 2015 at 9:31 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> Hello Lian,
> This blog talks about how to install Hive meta store. I thing that i took
> from it was the mysql-connector-java jar that needs to be used and it
> suggests 5.1.35 (mysql-connector-java-5.1.35-bin.jar
> ).
>
> When i use that.
>
> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
> /apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
> --jars /apache/hadoop/lib/hadoop-lzo-0.6.0.jar,
> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.35-bin.jar*,/home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,$SPARK_HOME/conf/hive-site.xml
> --num-executors 1 --driver-memory 4g --driver-java-options
> "-XX:MaxPermSize=2G" --executor-memory 2g --executor-cores 1 --queue
> hdmi-express --class com.ebay.ep.poc.spark.reporting.SparkApp
> spark_reporting-1.0-SNAPSHOT.jar startDate=2015-02-16 endDate=2015-02-16
> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>
> I still get the same error.
>
>
> org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a
> test connection to the given database. JDBC url = jdbc:mysql://
> hostname.vip.company.com:3306/HDB, username = hiveuser. Terminating
> connection pool (set lazyInit to true if you expect to start your database
> after your app). Original Exception: ------
>
> java.sql.SQLException: No suitable driver found for
> jdbc:mysql://hostname.vip. company.com:3306/HDB
>
> at java.sql.DriverManager.getConnection(DriverManager.java:596)
>
> Attached is the full stack trace & logs, if it can reveal some insights.
>
> Michael,
> Could you please take time and look into it.
>
> Regards,
> Deepak
>
>
> On Mon, Mar 30, 2015 at 10:04 PM, Cheng Lian <lian.cs.zju@gmail.com>
> wrote:
>
>>  Ah, sorry, my bad...
>> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html
>>
>>
>> On 3/30/15 10:24 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>
>>  Hello Lian
>> Can you share the URL ?
>>
>> On Mon, Mar 30, 2015 at 6:12 PM, Cheng Lian <lian.cs.zju@gmail.com>
>> wrote:
>>
>>>  The "mysql" command line doesn't use JDBC to talk to MySQL server, so
>>> this doesn't verify anything.
>>>
>>> I think this Hive metastore installation guide from Cloudera may be
>>> helpful. Although this document is for CDH4, the general steps are the
>>> same, and should help you to figure out the relationships here.
>>>
>>> Cheng
>>>
>>>
>>> On 3/30/15 3:33 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>>>
>>>  I am able to connect to MySQL Hive metastore from the client cluster
>>> machine.
>>>
>>>  -sh-4.1$ mysql --user=hiveuser --password=pass --host=
>>> hostname.vip.company.com
>>> Welcome to the MySQL monitor.  Commands end with ; or \g.
>>> Your MySQL connection id is 9417286
>>> Server version: 5.5.12-eb-5.5.12-log MySQL-eb 5.5.12, Revision 3492
>>> Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights
>>> reserved.
>>>  Oracle is a registered trademark of Oracle Corporation and/or its
>>>  affiliates. Other names may be trademarks of their respective
>>> owners.
>>> Type 'help;' or '\h' for help. Type '\c' to clear the current input
>>> statement.
>>>  mysql> use eBayHDB;
>>>  Reading table information for completion of table and column names
>>> You can turn off this feature to get a quicker startup with -A
>>>
>>>  Database changed
>>> mysql> show tables;
>>> +---------------------------+
>>> | Tables_in_HDB         |
>>>
>>>  +---------------------------+
>>>
>>>
>>>  Regards,
>>> Deepak
>>>
>>>
>>> On Sat, Mar 28, 2015 at 12:35 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>> wrote:
>>>
>>>> Yes am using yarn-cluster and i did add it via --files. I get "Suitable
>>>> error not found error"
>>>>
>>>>  Please share the spark-submit command that shows mysql jar containing
>>>> driver class used to connect to Hive MySQL meta store.
>>>>
>>>>  Even after including it through
>>>>
>>>>   --driver-class-path
>>>> /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>>  OR (AND)
>>>>  --jars /home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar
>>>>
>>>>  I keep getting "Suitable driver not found for"
>>>>
>>>>
>>>>  Command
>>>> ========
>>>>
>>>> ./bin/spark-submit -v --master yarn-cluster --driver-class-path
>>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar*:/apache/hadoop/share/hadoop/common/hadoop-common-2.4.1-EBAY-2.jar:/apache/hadoop/lib/hadoop-lzo-0.6.0.jar:/apache/hadoop-2.4.1-2.1.3.0-2-EBAY/share/hadoop/yarn/lib/guava-11.0.2.jar
>>>> --jars
>>>> /home/dvasthimal/spark1.3/spark-avro_2.10-1.0.0.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar,/home/dvasthimal/spark1.3/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar,
>>>> */home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.ja*r --files
>>>> $SPARK_HOME/conf/hive-site.xml  --num-executors 1 --driver-memory 4g
>>>> --driver-java-options "-XX:MaxPermSize=2G" --executor-memory 2g
>>>> --executor-cores 1 --queue hdmi-express --class
>>>> com.ebay.ep.poc.spark.reporting.SparkApp spark_reporting-1.0-SNAPSHOT.jar
>>>> startDate=2015-02-16 endDate=2015-02-16
>>>> input=/user/dvasthimal/epdatasets/successdetail1/part-r-00000.avro
>>>> subcommand=successevents2 output=/user/dvasthimal/epdatasets/successdetail2
>>>>  Logs
>>>> ====
>>>>
>>>>  Caused by: java.sql.SQLException: No suitable driver found for
>>>> jdbc:mysql://hostname:3306/HDB
>>>>  at java.sql.DriverManager.getConnection(DriverManager.java:596)
>>>>  at java.sql.DriverManager.getConnection(DriverManager.java:187)
>>>>  at
>>>> com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
>>>>  at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
>>>>  ... 68 more
>>>>  ...
>>>> ...
>>>>
>>>> 15/03/27 23:56:08 INFO yarn.Client: Uploading resource
>>>> file:/home/dvasthimal/spark1.3/mysql-connector-java-5.1.34.jar ->
>>>> hdfs://apollo-NN:8020/user/dvasthimal/.sparkStaging/application_1426715280024_119815/mysql-connector-java-5.1.34.jar
>>>>
>>>> ...
>>>>
>>>> ...
>>>>
>>>>
>>>>
>>>>  -sh-4.1$ jar -tvf ../mysql-connector-java-5.1.34.jar | grep Driver
>>>>     61 Fri Oct 17 08:05:36 GMT-07:00 2014
>>>> META-INF/services/java.sql.Driver
>>>>   3396 Fri Oct 17 08:05:22 GMT-07:00 2014
>>>> com/mysql/fabric/jdbc/FabricMySQLDriver.class
>>>> *   692 Fri Oct 17 08:05:22 GMT-07:00 2014 com/mysql/jdbc/Driver.class*
>>>>   1562 Fri Oct 17 08:05:20 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringDriver$ConnectionPhantomReference.class
>>>>  17817 Fri Oct 17 08:05:20 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringDriver.class
>>>>    690 Fri Oct 17 08:05:24 GMT-07:00 2014
>>>> com/mysql/jdbc/NonRegisteringReplicationDriver.class
>>>>    731 Fri Oct 17 08:05:24 GMT-07:00 2014
>>>> com/mysql/jdbc/ReplicationDriver.class
>>>>    336 Fri Oct 17 08:05:24 GMT-07:00 2014 org/gjt/mm/mysql/Driver.class
>>>> You have new mail in /var/spool/mail/dvasthimal
>>>> -sh-4.1$ cat conf/hive-site.xml | grep Driver
>>>>    <name>javax.jdo.option.ConnectionDriverName</name>
>>>> *  <value>com.mysql.jdbc.Driver</value>*
>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>  -sh-4.1$
>>>>
>>>>  --
>>>>  Deepak
>>>>
>>>>
>>>> On Sat, Mar 28, 2015 at 1:06 AM, Michael Armbrust <
>>>> michael@databricks.com> wrote:
>>>>
>>>>> Are you running on yarn?
>>>>>
>>>>>   - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
>>>>> /etc/hive/conf/ (or the directory where your hive-site.xml is located).
>>>>>  - If you are running in yarn-cluster mode, the easiest thing to do is
>>>>> to add--files=/etc/hive/conf/hive-site.xml (or the path for your
>>>>> hive-site.xml) to your spark-submit script.
>>>>>
>>>>> On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I can recreate tables but what about data. It looks like this is
a
>>>>>> obvious feature that Spark SQL must be having. People will want to
>>>>>> transform tons of data stored in HDFS through Hive from Spark SQL.
>>>>>>
>>>>>>  Spark programming guide suggests its possible.
>>>>>>
>>>>>>
>>>>>>  Spark SQL also supports reading and writing data stored in Apache
>>>>>> Hive <http://hive.apache.org/>.  .... Configuration of Hive
is done
>>>>>> by placing your hive-site.xml file in conf/.
>>>>>>
>>>>>> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>>>>>>
>>>>>>  For some reason its not working.
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>
>>>>>>>  Seems Spark SQL accesses some more columns apart from those
>>>>>>> created by hive.
>>>>>>>
>>>>>>>  You can always recreate the tables, you would need to execute
the
>>>>>>> table creation scripts but it would be good to avoid recreation.
>>>>>>>
>>>>>>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I did copy hive-conf.xml form Hive installation into
>>>>>>>> spark-home/conf. IT does have all the meta store connection
details, host,
>>>>>>>> username, passwd, driver and others.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  Snippet
>>>>>>>> ======
>>>>>>>>
>>>>>>>>
>>>>>>>>  <configuration>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>>>>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>>>>>>   <value>com.mysql.jdbc.Driver</value>
>>>>>>>>   <description>Driver class name for a JDBC metastore</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>>>>>>   <value>hiveuser</value>
>>>>>>>>   <description>username to use against metastore
>>>>>>>> database</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>>>>>>   <value>some-password</value>
>>>>>>>>   <description>password to use against metastore
>>>>>>>> database</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>hive.metastore.local</name>
>>>>>>>>   <value>false</value>
>>>>>>>>   <description>controls whether to connect to remove
metastore
>>>>>>>> server or open a new metastore server in Hive Client JVM</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  <property>
>>>>>>>>   <name>hive.metastore.warehouse.dir</name>
>>>>>>>>   <value>/user/hive/warehouse</value>
>>>>>>>>   <description>location of default database for the
>>>>>>>> warehouse</description>
>>>>>>>> </property>
>>>>>>>>
>>>>>>>>  ......
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  When i attempt to read hive table, it does not work. dw_bid
does
>>>>>>>> not exists.
>>>>>>>>
>>>>>>>>  I am sure there is a way to read tables stored in HDFS (Hive)
>>>>>>>> from Spark SQL. Otherwise how would anyone do analytics since
the source
>>>>>>>> tables are always either persisted directly on HDFS or through
Hive.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>>>>>>> arush@sigmoidanalytics.com> wrote:
>>>>>>>>
>>>>>>>>> Since hive and spark SQL internally use HDFS and Hive
metastore.
>>>>>>>>> The only thing you want to change is the processing engine.
You can try to
>>>>>>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure
that
>>>>>>>>> the hive site xml captures the metastore connection details).
>>>>>>>>>
>>>>>>>>>  Its a hack,  i havnt tried it. I have played around
with the
>>>>>>>>> metastore and it should work.
>>>>>>>>>
>>>>>>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
<
>>>>>>>>> deepujain@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> I have few tables that are created in Hive. I wan
to transform
>>>>>>>>>> data stored in these Hive tables using Spark SQL.
Is this even possible ?
>>>>>>>>>>
>>>>>>>>>>  So far i have seen that i can create new tables
using Spark SQL
>>>>>>>>>> dialect. However when i run show tables or do desc
hive_table it says table
>>>>>>>>>> not found.
>>>>>>>>>>
>>>>>>>>>>  I am now wondering is this support present or not
in Spark SQL ?
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>>  Deepak
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   --
>>>>>>>>>
>>>>>>>>> [image: Sigmoid Analytics]
>>>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>>>
>>>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>>>
>>>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   --
>>>>>>>>  Deepak
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>>
>>>>>>> [image: Sigmoid Analytics]
>>>>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>>>>
>>>>>>> *Arush Kharbanda* || Technical Teamlead
>>>>>>>
>>>>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>   --
>>>>>>  Deepak
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>   --
>>>>  Deepak
>>>>
>>>>
>>>
>>>
>>>  --
>>>  Deepak
>>>
>>>
>>>
>>
>>
>>  --
>>  Deepak
>>
>>
>>
>
>
> --
> Deepak
>
>


-- 
Deepak

Mime
View raw message