spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <gatorsm...@gmail.com>
Subject Re: Low Latency SQL query
Date Wed, 02 Dec 2015 00:45:29 GMT
http://cacm.acm.org/magazines/2011/6/108651-10-rules-for-scalable-performance-in-simple-operation-datastores/fulltext

Try to read this article. It might help you understand your problem.

Thanks,

Xiao Li

2015-12-01 16:36 GMT-08:00 Mark Hamstra <mark@clearstorydata.com>:

> I'd ask another question first: If your SQL query can be executed in a
> performant fashion against a conventional (RDBMS?) database, why are you
> trying to use Spark?  How you answer that question will be the key to
> deciding among the engineering design tradeoffs to effectively use Spark or
> some other solution.
>
> On Tue, Dec 1, 2015 at 4:23 PM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>
>> Ok, so latency problem is being generated because I'm using SQL as
>> source? how about csv, hive, or another source?
>>
>> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <mark@clearstorydata.com>
>> wrote:
>>
>>> It is not designed for interactive queries.
>>>
>>>
>>> You might want to ask the designers of Spark, Spark SQL, and
>>> particularly some things built on top of Spark (such as BlinkDB) about
>>> their intent with regard to interactive queries.  Interactive queries are
>>> not the only designed use of Spark, but it is going too far to claim that
>>> Spark is not designed at all to handle interactive queries.
>>>
>>> That being said, I think that you are correct to question the wisdom of
>>> expecting lowest-latency query response from Spark using SQL (sic,
>>> presumably a RDBMS is intended) as the datastore.
>>>
>>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfranke@gmail.com>
>>> wrote:
>>>
>>>> Hmm it will never be faster than SQL if you use SQL as an underlying
>>>> storage. Spark is (currently) an in-memory batch engine for iterative
>>>> machine learning workloads. It is not designed for interactive queries.
>>>> Currently hive is going into the direction of interactive queries.
>>>> Alternatives are Hbase on Phoenix or Impala.
>>>>
>>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>>>
>>>> Yes,
>>>> The use case would be,
>>>> Have spark in a service (I didnt invertigate this yet), through api
>>>> calls of this service we perform some aggregations over data in SQL, We are
>>>> already doing this with an internal development
>>>>
>>>> Nothing complicated, for instance, a table with Product, Product
>>>> Family, cost, price, etc. Columns like Dimension and Measures,
>>>>
>>>> I want to Spark for query that table to perform a kind of rollup, with
>>>> cost as Measure and Prodcut, Product Family as Dimension
>>>>
>>>> Only 3 columns, it takes like 20s to perform that query and the
>>>> aggregation, the  query directly to the database with a grouping at the
>>>> columns takes like 1s
>>>>
>>>> regards
>>>>
>>>>
>>>>
>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfranke@gmail.com>
>>>> wrote:
>>>>
>>>>> can you elaborate more on the use case?
>>>>>
>>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaivaldi@gmail.com>
wrote:
>>>>> >
>>>>> > Hi,
>>>>> >
>>>>> > I'd like to use spark to perform some transformations over data
>>>>> stored inSQL, but I need low Latency, I'm doing some test and I run into
>>>>> spark context creation and data query over SQL takes too long time.
>>>>> >
>>>>> > Any idea for speed up the process?
>>>>> >
>>>>> > regards.
>>>>> >
>>>>> > --
>>>>> > Ing. Ivaldi Andres
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>>
>>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>

Mime
View raw message