spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Hive and distributed sql engine
Date Mon, 25 Jul 2016 07:33:59 GMT
In order to use existing pg UDF, you may create a view in pg and expose the
view to hive.
Spark to database connection happens from each executors, so you must have
a connection or a pool of connection per worker. Executors of the same
worker can share connection pool.

Best
Ayan
On 25 Jul 2016 16:48, "Marco Colombo" <ing.marco.colombo@gmail.com> wrote:

> Hi all!
> Among other use cases, I want to use spark as a distributed sql engine
> via thrift server.
> I have some tables in postegres and Cassandra: I need to expose them via
> hive for custom reporting.
> Basic implementation is simple and works, but I have some concerns and
> open question:
> - is there a better approach rather than mapping a temp table as a select
> of the full table?
> - What about query setup cost? I mean, is there a way to avoid db
> connection setup costs using a pre-created connection pool?
> - is it possibile from hiveql to use functions defined in the pg database
> or should I have to rewrite them as udaf?
>
> Thanks!
>
>
>
> --
> Ing. Marco Colombo
>

Mime
View raw message