spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Piu <>
Subject Thrift Server as JDBC endpoint
Date Wed, 15 Mar 2017 12:37:01 GMT
Hi all,

I'm doing some research on best ways to expose data created by some of our
spark jobs so that they can be consumed by a client (A Web UI).

The data we need to serve might be huge but we can control the type of
queries that are submitted e.g.:
* Limit number of results
* only accept SELECT statements (i.e. readonly)
* Only expose some pre-calculated datasets, as in, always going to a
particular partitions - no joins etc.

In terms of latency, the lower the better but we don't have any weird
scenarios like sub second responses and stability is hugely preferred.

Is thrift server stable for this kind of use cases? How does it perform
under concurrency? Is it better to have several instances and load balance
them or a single one with more resources?

Would be interested in hearing any experiences from people using this on
prod environments


View raw message