spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marco Colombo <ing.marco.colo...@gmail.com>
Subject Re: Hive and distributed sql engine
Date Mon, 25 Jul 2016 07:39:41 GMT
Thanks. That what I was thinking.
But how to setup connection per worker?


Il lunedì 25 luglio 2016, ayan guha <guha.ayan@gmail.com> ha scritto:

> In order to use existing pg UDF, you may create a view in pg and expose
> the view to hive.
> Spark to database connection happens from each executors, so you must have
> a connection or a pool of connection per worker. Executors of the same
> worker can share connection pool.
>
> Best
> Ayan
> On 25 Jul 2016 16:48, "Marco Colombo" <ing.marco.colombo@gmail.com
> <javascript:_e(%7B%7D,'cvml','ing.marco.colombo@gmail.com');>> wrote:
>
>> Hi all!
>> Among other use cases, I want to use spark as a distributed sql engine
>> via thrift server.
>> I have some tables in postegres and Cassandra: I need to expose them via
>> hive for custom reporting.
>> Basic implementation is simple and works, but I have some concerns and
>> open question:
>> - is there a better approach rather than mapping a temp table as a select
>> of the full table?
>> - What about query setup cost? I mean, is there a way to avoid db
>> connection setup costs using a pre-created connection pool?
>> - is it possibile from hiveql to use functions defined in the pg database
>> or should I have to rewrite them as udaf?
>>
>> Thanks!
>>
>>
>>
>> --
>> Ing. Marco Colombo
>>
>

-- 
Ing. Marco Colombo

Mime
View raw message