spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrés Ivaldi <iaiva...@gmail.com>
Subject Re: Low Latency SQL query
Date Wed, 02 Dec 2015 00:23:19 GMT
Ok, so latency problem is being generated because I'm using SQL as source?
how about csv, hive, or another source?

On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> It is not designed for interactive queries.
>
>
> You might want to ask the designers of Spark, Spark SQL, and particularly
> some things built on top of Spark (such as BlinkDB) about their intent with
> regard to interactive queries.  Interactive queries are not the only
> designed use of Spark, but it is going too far to claim that Spark is not
> designed at all to handle interactive queries.
>
> That being said, I think that you are correct to question the wisdom of
> expecting lowest-latency query response from Spark using SQL (sic,
> presumably a RDBMS is intended) as the datastore.
>
> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>
>> Hmm it will never be faster than SQL if you use SQL as an underlying
>> storage. Spark is (currently) an in-memory batch engine for iterative
>> machine learning workloads. It is not designed for interactive queries.
>> Currently hive is going into the direction of interactive queries.
>> Alternatives are Hbase on Phoenix or Impala.
>>
>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>
>> Yes,
>> The use case would be,
>> Have spark in a service (I didnt invertigate this yet), through api calls
>> of this service we perform some aggregations over data in SQL, We are
>> already doing this with an internal development
>>
>> Nothing complicated, for instance, a table with Product, Product Family,
>> cost, price, etc. Columns like Dimension and Measures,
>>
>> I want to Spark for query that table to perform a kind of rollup, with
>> cost as Measure and Prodcut, Product Family as Dimension
>>
>> Only 3 columns, it takes like 20s to perform that query and the
>> aggregation, the  query directly to the database with a grouping at the
>> columns takes like 1s
>>
>> regards
>>
>>
>>
>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfranke@gmail.com> wrote:
>>
>>> can you elaborate more on the use case?
>>>
>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>> >
>>> > Hi,
>>> >
>>> > I'd like to use spark to perform some transformations over data stored
>>> inSQL, but I need low Latency, I'm doing some test and I run into spark
>>> context creation and data query over SQL takes too long time.
>>> >
>>> > Any idea for speed up the process?
>>> >
>>> > regards.
>>> >
>>> > --
>>> > Ing. Ivaldi Andres
>>>
>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>>
>


-- 
Ing. Ivaldi Andres

Mime
View raw message