hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ric Wang <wqt.w...@gmail.com>
Subject scanner on a given column: whole table scan or just the rows that have values
Date Tue, 09 Jun 2009 21:10:01 GMT

My HBase table has millions of rows; and on given column (ex.
famliyA:labelB), only a couple of thousand rows really have values (sparse).
Now my task is to find out the set of row keys whose column value of
"familyA:labelB" satisfy some kind of condition.

For that task, I am getting a scanner on the column "familyA:labelB";
looping over the values of that column (I guess I'd better off using some
kind of filter instead, but regardless...); if the value matches my
condition, I get the corresponding row key and add it into the result set.

My questions are:

1. When the scanner loops over the column, is it scanning the whole table of
millions of rows, or mostly just the ones that really have values for that
particular column? My guess is that it's NOT scanning the whole table per my
very limited understanding of how column-based database works; seems that'd
be awfully inefficient. Can someone please let me know?

2. If in the unfortunate case, that whole table scan does have to happen,
any suggestions on how I could change my table design (adding index..?) to
avoid the performance hit?

Thanks very much for your help!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message