spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michal Klos <michal.klo...@gmail.com>
Subject Re: Low Latency SQL query
Date Wed, 02 Dec 2015 00:27:41 GMT
You should consider presto for this use case. If you want fast "first query" times it is a
better fit.

I think sparksql will catch up at some point but if you are not doing multiple queries against
data cached in RDDs and need low latency it may not be a good fit.

M

> On Dec 1, 2015, at 7:23 PM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
> 
> Ok, so latency problem is being generated because I'm using SQL as source? how about
csv, hive, or another source?
> 
> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <mark@clearstorydata.com> wrote:
>>> It is not designed for interactive queries.
>> 
>> You might want to ask the designers of Spark, Spark SQL, and particularly some things
built on top of Spark (such as BlinkDB) about their intent with regard to interactive queries.
 Interactive queries are not the only designed use of Spark, but it is going too far to claim
that Spark is not designed at all to handle interactive queries.
>> 
>> That being said, I think that you are correct to question the wisdom of expecting
lowest-latency query response from Spark using SQL (sic, presumably a RDBMS is intended) as
the datastore.
>> 
>>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>>> Hmm it will never be faster than SQL if you use SQL as an underlying storage.
Spark is (currently) an in-memory batch engine for iterative machine learning workloads. It
is not designed for interactive queries. 
>>> Currently hive is going into the direction of interactive queries. Alternatives
are Hbase on Phoenix or Impala.
>>> 
>>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>>> 
>>>> Yes, 
>>>> The use case would be,
>>>> Have spark in a service (I didnt invertigate this yet), through api calls
of this service we perform some aggregations over data in SQL, We are already doing this with
an internal development
>>>> 
>>>> Nothing complicated, for instance, a table with Product, Product Family,
cost, price, etc. Columns like Dimension and Measures,
>>>> 
>>>> I want to Spark for query that table to perform a kind of rollup, with cost
as Measure and Prodcut, Product Family as Dimension
>>>> 
>>>> Only 3 columns, it takes like 20s to perform that query and the aggregation,
the  query directly to the database with a grouping at the columns takes like 1s 
>>>> 
>>>> regards
>>>> 
>>>> 
>>>> 
>>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfranke@gmail.com>
wrote:
>>>>> can you elaborate more on the use case?
>>>>> 
>>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaivaldi@gmail.com>
wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I'd like to use spark to perform some transformations over data
stored inSQL, but I need low Latency, I'm doing some test and I run into spark context creation
and data query over SQL takes too long time.
>>>>> >
>>>>> > Any idea for speed up the process?
>>>>> >
>>>>> > regards.
>>>>> >
>>>>> > --
>>>>> > Ing. Ivaldi Andres
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Ing. Ivaldi Andres
> 
> 
> 
> -- 
> Ing. Ivaldi Andres

Mime
View raw message