spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Ubale <siddharth.ub...@syncoms.com>
Subject Re: real time Query engine Spark-SQL on Hbase
Date Fri, 01 May 2015 19:30:38 GMT
Hi,


Thanks for the reply.


Hbase cli takes less than 500 ms for the same query.

I am running a simple query i.t "Select * from Customers where c_id='123123'".

Why would the same query which takes 500 ms at Hbase cli end up taking around 8 secs via Spark-Sql?

I am unable t understand this.


Thanks,

Siddharth





________________________________
From: ayan guha <guha.ayan@gmail.com>
Sent: 01 May 2015 04:38
To: Ted Yu
Cc: user@spark.apache.org; Siddharth Ubale; matei.zaharia@gmail.com; Prakash Hosalli; Amit
Kumar
Subject: Re: real time Query engine Spark-SQL on Hbase


And if I may ask, how long it takes in hbase CLI? I would not expect spark to  improve performance
of hbase. At best spark will push down the filter to hbase. So I would try to optimise any
additional overhead like bringing data into spark.

On 1 May 2015 00:56, "Ted Yu" <yuzhihong@gmail.com<mailto:yuzhihong@gmail.com>>
wrote:
bq. a single query on one filter criteria

Can you tell us more about your filter ? How selective is it ?

Which hbase release are you using ?

Cheers

On Thu, Apr 30, 2015 at 7:23 AM, Siddharth Ubale <siddharth.ubale@syncoms.com<mailto:siddharth.ubale@syncoms.com>>
wrote:
Hi,

I want to use Spark as Query engine on HBase with sub second latency.

I am  using Spark 1.3  version. And followed the steps below on Hbase table with around 3.5
lac rows :


1.       Mapped the Dataframe to Hbase table .RDDCustomers maps to the hbase table which is
used to create the Dataframe.

" DataFrame schemaCustomers = sqlInstance

                                                                                .createDataFrame(SparkContextImpl.getRddCustomers(),
                                                                                         
              Customers.class);"

2.       Used registertemp table i.e" schemaCustomers.registerTempTable("customers");"

3.       Running the query on Dataframe using Sqlcontext Instance.

What I am observing is that for a single query on one filter criteria the query is taking
7-8 seconds? And the time increases as I am increasing the number of rows in Hbase table.
Also, there was one time when I was getting query response under 1-2 seconds. Seems like strange
behavior.
Is this expected behavior from Spark or am I missing something here?
Can somebody help me understand this scenario . Please assist.

Thanks,
Siddharth Ubale,



Mime
View raw message