hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shengjie Min <kelvin....@gmail.com>
Subject Re: HBase - Secondary Index
Date Fri, 28 Dec 2012 10:55:41 GMT
>Yes as you say when the no of rows to be returned is becoming more and
more the latency will be becoming more.  seeks within an HFile block is
some what expensive op now. (Not much but still)  The new encoding >prefix
trie will be a huge bonus here. There the seeks will be flying.. [Ted also
presented this in the Hadoop China]  Thanks to Matt... :)  I am trying to
measure the scan performance with this new encoding . Trying to >back port
a simple patch for 94 version just for testing...   Yes when the no of
results to be returned is more and more any index will become less
performing as per my study  :)

yes, you are right, I guess it's just a drawback of any index approach.
Thanks for the explanation.

Shengjie

On 28 December 2012 04:14, Anoop Sam John <anoopsj@huawei.com> wrote:

> > Do you have link to that presentation?
>
> http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf
>
> -Anoop-
>
> ________________________________________
> From: Mohit Anchlia [mohitanchlia@gmail.com]
> Sent: Friday, December 28, 2012 9:12 AM
> To: user@hbase.apache.org
> Subject: Re: HBase - Secondary Index
>
> On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John <anoopsj@huawei.com>
> wrote:
>
> > Yes as you say when the no of rows to be returned is becoming more and
> > more the latency will be becoming more.  seeks within an HFile block is
> > some what expensive op now. (Not much but still)  The new encoding prefix
> > trie will be a huge bonus here. There the seeks will be flying.. [Ted
> also
> > presented this in the Hadoop China]  Thanks to Matt... :)  I am trying to
> > measure the scan performance with this new encoding . Trying to back
> port a
> > simple patch for 94 version just for testing...   Yes when the no of
> > results to be returned is more and more any index will become less
> > performing as per my study  :)
> >
> > Do you have link to that presentation?
>
>
> > >btw, quick question- in your presentation, the scale there is seconds or
> > mill-seconds:)
> >
> > It is seconds.  Dont consider the exact values. What is the % of increase
> > in latency is important :) Those were not high end machines.
> >
> > -Anoop-
> > ________________________________________
> > From: Shengjie Min [kelvin.msj@gmail.com]
> > Sent: Thursday, December 27, 2012 9:59 PM
> > To: user@hbase.apache.org
> > Subject: Re: HBase - Secondary Index
> >
> >  >Didnt follow u completely here. There wont be any get() happening.. As
> > the
> > >exact rowkey in a region we get from the index table, we can seek to the
> > >exact position and return that row.
> >
> > Sorry, When I misused "get()" here, I meant seeking. Yes, if it's just
> > small number of rows returned, this works perfect. As you said you will
> get
> > the exact rowkey positions per region, and simply seek them. I was trying
> > to work out the case that when the number of result rows increases
> > massively. Like in Anil's case, he wants to do a scan query against the
> > 2ndary index(timestamp): "select all rows from timestamp1 to timestamp2"
> > given no customerId provided. During that time period, he might have a
> big
> > chunk of rows from different customerIds. The index table returns a lot
> of
> > rowkey positions for different customerIds (I believe they are scattered
> in
> > different regions), then you end up seeking all different positions in
> > different regions and return all the rows needed. According to your
> > presentation page14 - Performance Test Results (Scan), without index,
> it's
> > a linear increase as result rows # increases. on the other hand, with
> > index, time spent climbs up way quicker than the case without index.
> >
> > btw, quick question- in your presentation, the scale there is seconds or
> > mill-seconds:)
> >
> > - Shengjie
> >
> >
> > On 27 December 2012 15:54, Anoop John <anoop.hbase@gmail.com> wrote:
> >
> > > >how the massive number of get() is going to
> > > perform againt the main table
> > >
> > > Didnt follow u completely here. There wont be any get() happening.. As
> > the
> > > exact rowkey in a region we get from the index table, we can seek to
> the
> > > exact position and return that row.
> > >
> > > -Anoop-
> > >
> > > On Thu, Dec 27, 2012 at 6:37 PM, Shengjie Min <kelvin.msj@gmail.com>
> > > wrote:
> > >
> > > > how the massive number of get() is going to
> > > > perform againt the main table
> > > >
> > >
> >
> >
> >
> > --
> > All the best,
> > Shengjie Min
> >
>



-- 
All the best,
Shengjie Min

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message