spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Scala Spark / Shark: How to access existing Hive tables in Hortonworks?
Date Fri, 02 May 2014 13:37:11 GMT
Shark will communicate with JDBC with Hive *meta *server. Thr is no such
thing as Hive server, Hive stores all its data in hadoop hdfs, which is
where shark pulls it from.

Shark works on nested select queries.



Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Sat, Apr 26, 2014 at 2:52 AM, Darq Moth <darqmoth@gmail.com> wrote:

> Thanks!
> For now I use JDBC from Scala to get data from Hive.  In Hive I have a
> simple table with 20 rows in the following format:
>
> user_id, movie_title, rating, date
>
> I do 3 nested select requests:
> 1) select distinct user_id
>      2) for each user_id:
>          select distinct movie_title  //select all movies that user saw
>             3) for each movie_title:
>                 select distinct user_id  //select all user who saw this
> movie
>
> On a local Hive table with 20 rows these nested querries work 26 min!
>
> Questions:
> 1) Will Shark optimize nested select requests or not and just use the same
> selects on JDBC?
> 2) What wire protocol will Shark use to communicate with remote Hive
> server?
>
>
> On Sat, Apr 26, 2014 at 12:35 AM, Mayur Rustagi <mayur.rustagi@gmail.com>wrote:
>
>> You have to configure shark to access the Hortonworks hive metastore
>> (hcatalog?) & you will start seeing the tables in shark shell & can run
>> queries like normal & shark will leverage spark for processing your queries.
>>
>> Mayur Rustagi
>> Ph: +1 (760) 203 3257
>> http://www.sigmoidanalytics.com
>>  @mayur_rustagi <https://twitter.com/mayur_rustagi>
>>
>>
>>
>> On Sat, Apr 26, 2014 at 2:00 AM, Darq Moth <darqmoth@gmail.com> wrote:
>>
>>> I am trying to find some docs / description of the approach on the
>>> subject, please help. I have Hadoop 2.2.0 from Hortonworks installed with
>>> some existing Hive tables I need to query. Hive SQL works extremly and
>>> unreasonably slow on single node and cluster as well. I hope Shark will
>>> work faster.
>>>
>>> From Spark/Shark docs I can not figure out how to make Shark work with
>>> existing Hive tables. Any ideas how to achieve this? Thanks!
>>>
>>
>>
>

Mime
View raw message