hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Brian.Lev...@nokia.com>
Subject RE: scanner on a given column: whole table scan or just the rows that have values
Date Wed, 10 Jun 2009 00:41:12 GMT
My guess is that the scanner actually does examine every row. As you suggest, adding a RowFilter
would be the way to go here.  This way, you're certain to get back only those rows that match
the criteria expressed in the RowFilter.


From: ext Ric Wang [wqt.work@gmail.com]
Sent: Tuesday, June 09, 2009 5:10 PM
To: hbase-user@hadoop.apache.org
Subject: scanner on a given column: whole table scan or just the rows that      have values


My HBase table has millions of rows; and on given column (ex.
famliyA:labelB), only a couple of thousand rows really have values (sparse).
Now my task is to find out the set of row keys whose column value of
"familyA:labelB" satisfy some kind of condition.

For that task, I am getting a scanner on the column "familyA:labelB";
looping over the values of that column (I guess I'd better off using some
kind of filter instead, but regardless...); if the value matches my
condition, I get the corresponding row key and add it into the result set.

My questions are:

1. When the scanner loops over the column, is it scanning the whole table of
millions of rows, or mostly just the ones that really have values for that
particular column? My guess is that it's NOT scanning the whole table per my
very limited understanding of how column-based database works; seems that'd
be awfully inefficient. Can someone please let me know?

2. If in the unfortunate case, that whole table scan does have to happen,
any suggestions on how I could change my table design (adding index..?) to
avoid the performance hit?

Thanks very much for your help!

View raw message