spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Low Latency SQL query
Date Wed, 02 Dec 2015 07:35:57 GMT
You can try query push down by creating the query while creating the rdd.
On 2 Dec 2015 12:32, "Fengdong Yu" <fengdongy@everstring.com> wrote:

> It depends on many situations:
>
> 1) what’s your data format?  csv(text) or ORC/parquet?
> 2) Did you have Data warehouse to summary/cluster  your data?
>
>
> if your data is text or you query for the raw data, It should be slow,
> Spark cannot do much to optimize your job.
>
>
>
>
> On Dec 2, 2015, at 9:21 AM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>
> Mark, We have an application that use data from different kind of source,
> and we build a engine able to handle that, but cant scale with big data(we
> could but is to time expensive), and doesn't have Machine learning module,
> etc, we came across with Spark and it's looks like it have all we need,
> actually it does, but our latency is very low right now, and when we do
> some testing it took too long time for the same kind of results, always
> against RDBM which is our primary source.
>
> So, we want to expand our sources, to cvs, web service, big data, etc, we
> can extend our engine or use something like Spark, which give as power of
> clustering, different kind of source access, streaming, machine learning,
> easy extensibility and so on.
>
> On Tue, Dec 1, 2015 at 9:36 PM, Mark Hamstra <mark@clearstorydata.com>
> wrote:
>
>> I'd ask another question first: If your SQL query can be executed in a
>> performant fashion against a conventional (RDBMS?) database, why are you
>> trying to use Spark?  How you answer that question will be the key to
>> deciding among the engineering design tradeoffs to effectively use Spark or
>> some other solution.
>>
>> On Tue, Dec 1, 2015 at 4:23 PM, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>
>>> Ok, so latency problem is being generated because I'm using SQL as
>>> source? how about csv, hive, or another source?
>>>
>>> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <mark@clearstorydata.com>
>>> wrote:
>>>
>>>> It is not designed for interactive queries.
>>>>
>>>>
>>>> You might want to ask the designers of Spark, Spark SQL, and
>>>> particularly some things built on top of Spark (such as BlinkDB) about
>>>> their intent with regard to interactive queries.  Interactive queries are
>>>> not the only designed use of Spark, but it is going too far to claim that
>>>> Spark is not designed at all to handle interactive queries.
>>>>
>>>> That being said, I think that you are correct to question the wisdom of
>>>> expecting lowest-latency query response from Spark using SQL (sic,
>>>> presumably a RDBMS is intended) as the datastore.
>>>>
>>>> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <jornfranke@gmail.com>
>>>> wrote:
>>>>
>>>>> Hmm it will never be faster than SQL if you use SQL as an underlying
>>>>> storage. Spark is (currently) an in-memory batch engine for iterative
>>>>> machine learning workloads. It is not designed for interactive queries.
>>>>> Currently hive is going into the direction of interactive queries.
>>>>> Alternatives are Hbase on Phoenix or Impala.
>>>>>
>>>>> On 01 Dec 2015, at 21:58, Andrés Ivaldi <iaivaldi@gmail.com> wrote:
>>>>>
>>>>> Yes,
>>>>> The use case would be,
>>>>> Have spark in a service (I didnt invertigate this yet), through api
>>>>> calls of this service we perform some aggregations over data in SQL,
We are
>>>>> already doing this with an internal development
>>>>>
>>>>> Nothing complicated, for instance, a table with Product, Product
>>>>> Family, cost, price, etc. Columns like Dimension and Measures,
>>>>>
>>>>> I want to Spark for query that table to perform a kind of rollup, with
>>>>> cost as Measure and Prodcut, Product Family as Dimension
>>>>>
>>>>> Only 3 columns, it takes like 20s to perform that query and the
>>>>> aggregation, the  query directly to the database with a grouping at the
>>>>> columns takes like 1s
>>>>>
>>>>> regards
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <jornfranke@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> can you elaborate more on the use case?
>>>>>>
>>>>>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <iaivaldi@gmail.com>
wrote:
>>>>>> >
>>>>>> > Hi,
>>>>>> >
>>>>>> > I'd like to use spark to perform some transformations over data
>>>>>> stored inSQL, but I need low Latency, I'm doing some test and I run
into
>>>>>> spark context creation and data query over SQL takes too long time.
>>>>>> >
>>>>>> > Any idea for speed up the process?
>>>>>> >
>>>>>> > regards.
>>>>>> >
>>>>>> > --
>>>>>> > Ing. Ivaldi Andres
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ing. Ivaldi Andres
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>
>
>
> --
> Ing. Ivaldi Andres
>
>
>

Mime
View raw message