spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Low Latency SQL query
Date Wed, 02 Dec 2015 00:31:31 GMT
Right, you can't expect a completely cold first query to execute faster
than the data can be retrieved from the underlying datastore.  After that,
lowest latency query performance is largely a matter of caching -- for
which Spark provides at least partial solutions.

On Tue, Dec 1, 2015 at 4:27 PM, Michal Klos <michal.klos81@gmail.com> wrote:

> You should consider presto for this use case. If you want fast "first
> query" times it is a better fit.
>
> I think sparksql will catch up at some point but if you are not doing
> multiple queries against data cached in RDDs and need low latency it may
> not be a good fit.
>
> M
>
> On Dec 1, 2015, at 7:23 PM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>
> Ok, so latency problem is being generated because I'm using SQL as source?
> how about csv, hive, or another source?
>
> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
>
>> It is not designed for interactive queries.
>>
>>
>> You might want to ask the designers of Spark, Spark SQL, and particularly
>> some things built on top of Spark (such as BlinkDB) about their intent with
>> regard to interactive queries.  Interactive queries are not the only
>> designed use of Spark, but it is going too far to claim that Spark is not
>> designed at all to handle interactive queries.
>>
>> That being said, I think that you are correct to question the wisdom of
>> expecting lowest-latency query response from Spark using SQL (sic,
>> presumably a RDBMS is intended) as the datastore.
>>
>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>>
>>> Hmm it will never be faster than SQL if you use SQL as an underlying
>>> storage. Spark is (currently) an in-memory batch engine for iterative
>>> machine learning workloads. It is not designed for interactive queries.
>>> Currently hive is going into the direction of interactive queries.
>>> Alternatives are Hbase on Phoenix or Impala.
>>>
>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>>
>>> Yes,
>>> The use case would be,
>>> Have spark in a service (I didnt invertigate this yet), through api
>>> calls of this service we perform some aggregations over data in SQL, We are
>>> already doing this with an internal development
>>>
>>> Nothing complicated, for instance, a table with Product, Product Family,
>>> cost, price, etc. Columns like Dimension and Measures,
>>>
>>> I want to Spark for query that table to perform a kind of rollup, with
>>> cost as Measure and Prodcut, Product Family as Dimension
>>>
>>> Only 3 columns, it takes like 20s to perform that query and the
>>> aggregation, the  query directly to the database with a grouping at the
>>> columns takes like 1s
>>>
>>> regards
>>>
>>>
>>>
>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfranke@gmail.com>
>>> wrote:
>>>
>>>> can you elaborate more on the use case?
>>>>
>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaivaldi@gmail.com>
wrote:
>>>> >
>>>> > Hi,
>>>> >
>>>> > I'd like to use spark to perform some transformations over data
>>>> stored inSQL, but I need low Latency, I'm doing some test and I run into
>>>> spark context creation and data query over SQL takes too long time.
>>>> >
>>>> > Any idea for speed up the process?
>>>> >
>>>> > regards.
>>>> >
>>>> > --
>>>> > Ing. Ivaldi Andres
>>>>
>>>
>>>
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>>
>>
>
>
> --
> Ing. Ivaldi Andres
>
>

Mime
View raw message