hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ric Wang <wqt.w...@gmail.com>
Subject Re: scanner on a given column: whole table scan or just the rows that have values
Date Wed, 10 Jun 2009 02:23:37 GMT
How does the scanner know how to get ONLY the "relevant" rows, without a
whole table scan?

Thanks!
Ric



On Tue, Jun 9, 2009 at 4:31 PM, Naveen Koorakula <naveenk@gmail.com> wrote:

> The scanner only scans the relevant rows.
>
> On Tue, Jun 9, 2009 at 2:10 PM, Ric Wang <wqt.work@gmail.com> wrote:
>
> > Hi,
> >
> > My HBase table has millions of rows; and on given column (ex.
> > famliyA:labelB), only a couple of thousand rows really have values
> > (sparse).
> > Now my task is to find out the set of row keys whose column value of
> > "familyA:labelB" satisfy some kind of condition.
> >
> > For that task, I am getting a scanner on the column "familyA:labelB";
> > looping over the values of that column (I guess I'd better off using some
> > kind of filter instead, but regardless...); if the value matches my
> > condition, I get the corresponding row key and add it into the result
> set.
> >
> > My questions are:
> >
> > 1. When the scanner loops over the column, is it scanning the whole table
> > of
> > millions of rows, or mostly just the ones that really have values for
> that
> > particular column? My guess is that it's NOT scanning the whole table per
> > my
> > very limited understanding of how column-based database works; seems
> that'd
> > be awfully inefficient. Can someone please let me know?
> >
> > 2. If in the unfortunate case, that whole table scan does have to happen,
> > any suggestions on how I could change my table design (adding index..?)
> to
> > avoid the performance hit?
> >
> > Thanks very much for your help!
> > Ric
> >
>



-- 
Ric Wang
wqt.work@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message