spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Can spark sql read existing tables created in hive
Date Fri, 27 Mar 2015 19:36:25 GMT
Are you running on yarn?

 - If you are running in yarn-client mode, set HADOOP_CONF_DIR to
/etc/hive/conf/ (or the directory where your hive-site.xml is located).
 - If you are running in yarn-cluster mode, the easiest thing to do is to
add--files=/etc/hive/conf/hive-site.xml (or the path for your
hive-site.xml) to your spark-submit script.

On Fri, Mar 27, 2015 at 5:42 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com> wrote:

> I can recreate tables but what about data. It looks like this is a obvious
> feature that Spark SQL must be having. People will want to transform tons
> of data stored in HDFS through Hive from Spark SQL.
>
> Spark programming guide suggests its possible.
>
>
> Spark SQL also supports reading and writing data stored in Apache Hive
> <http://hive.apache.org/>.  .... Configuration of Hive is done by placing
> your hive-site.xml file in conf/.
> https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#hive-tables
>
> For some reason its not working.
>
>
> On Fri, Mar 27, 2015 at 3:35 PM, Arush Kharbanda <
> arush@sigmoidanalytics.com> wrote:
>
>> Seems Spark SQL accesses some more columns apart from those created by
>> hive.
>>
>> You can always recreate the tables, you would need to execute the table
>> creation scripts but it would be good to avoid recreation.
>>
>> On Fri, Mar 27, 2015 at 3:20 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>> wrote:
>>
>>> I did copy hive-conf.xml form Hive installation into spark-home/conf. IT
>>> does have all the meta store connection details, host, username, passwd,
>>> driver and others.
>>>
>>>
>>>
>>> Snippet
>>> ======
>>>
>>>
>>> <configuration>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionURL</name>
>>>   <value>jdbc:mysql://host.vip.company.com:3306/HDB</value>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionDriverName</name>
>>>   <value>com.mysql.jdbc.Driver</value>
>>>   <description>Driver class name for a JDBC metastore</description>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionUserName</name>
>>>   <value>hiveuser</value>
>>>   <description>username to use against metastore database</description>
>>> </property>
>>>
>>> <property>
>>>   <name>javax.jdo.option.ConnectionPassword</name>
>>>   <value>some-password</value>
>>>   <description>password to use against metastore database</description>
>>> </property>
>>>
>>> <property>
>>>   <name>hive.metastore.local</name>
>>>   <value>false</value>
>>>   <description>controls whether to connect to remove metastore server or
>>> open a new metastore server in Hive Client JVM</description>
>>> </property>
>>>
>>> <property>
>>>   <name>hive.metastore.warehouse.dir</name>
>>>   <value>/user/hive/warehouse</value>
>>>   <description>location of default database for the
>>> warehouse</description>
>>> </property>
>>>
>>> ......
>>>
>>>
>>>
>>> When i attempt to read hive table, it does not work. dw_bid does not
>>> exists.
>>>
>>> I am sure there is a way to read tables stored in HDFS (Hive) from Spark
>>> SQL. Otherwise how would anyone do analytics since the source tables are
>>> always either persisted directly on HDFS or through Hive.
>>>
>>>
>>> On Fri, Mar 27, 2015 at 1:15 PM, Arush Kharbanda <
>>> arush@sigmoidanalytics.com> wrote:
>>>
>>>> Since hive and spark SQL internally use HDFS and Hive metastore. The
>>>> only thing you want to change is the processing engine. You can try to
>>>> bring your hive-site.xml to %SPARK_HOME%/conf/hive-site.xml.(Ensure that
>>>> the hive site xml captures the metastore connection details).
>>>>
>>>> Its a hack,  i havnt tried it. I have played around with the metastore
>>>> and it should work.
>>>>
>>>> On Fri, Mar 27, 2015 at 12:04 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) <deepujain@gmail.com>
>>>> wrote:
>>>>
>>>>> I have few tables that are created in Hive. I wan to transform data
>>>>> stored in these Hive tables using Spark SQL. Is this even possible ?
>>>>>
>>>>> So far i have seen that i can create new tables using Spark SQL
>>>>> dialect. However when i run show tables or do desc hive_table it says
table
>>>>> not found.
>>>>>
>>>>> I am now wondering is this support present or not in Spark SQL ?
>>>>>
>>>>> --
>>>>> Deepak
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> [image: Sigmoid Analytics]
>>>> <http://htmlsig.com/www.sigmoidanalytics.com>
>>>>
>>>> *Arush Kharbanda* || Technical Teamlead
>>>>
>>>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>>>
>>>
>>>
>>>
>>> --
>>> Deepak
>>>
>>>
>>
>>
>> --
>>
>> [image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>
>>
>> *Arush Kharbanda* || Technical Teamlead
>>
>> arush@sigmoidanalytics.com || www.sigmoidanalytics.com
>>
>
>
>
> --
> Deepak
>
>

Mime
View raw message