hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Duxbury <br...@rapleaf.com>
Subject Re: Newbie user questions
Date Mon, 03 Mar 2008 04:00:10 GMT

The HBase shell is meant only to be used for administrative purposes,  
like managing tables. You can do limited CRUD operations, but they're  
mostly there for the benefit of initial testing and tracking down  
bugs. HQL is also not SQL, so you shouldn't anticipate there being  
many SQL features.

In the Java, REST and Thrift APIs for HBase, there are two types of  
accesses - single-row gets and multi-row scans. There are a lot of  
options surrounding gets, so there's probably something that matches  
your needs, but you have to know the row key to start with. Scans are  
used whenever you need to operate on a number of rows. The cursor  
model is indeed the closest analogy for a scanner.

If you need to do a join in the traditional sense, then yes, you need  
to have at least two scanners and do the joining yourself. However,  
if possible, you might want to consider denormalizing the data from  
the two tables you'd be joining into a single table. I don't mean one  
row per <table1,table2> tuple - HBase supports an arbitrary number of  
columns per row, so if your second table is really a subordinate  
entity, you might get some benefit from moving all to one table.

The return values for scanners are Java Maps containing your data  
(assuming you're in the Java API). Does that answer your question?


On Mar 2, 2008, at 7:01 PM, alexthompson@sitelabs.com wrote:

> Newbie user questions. Can you correct me if I am wrong in my  
> following statements:
> I have looked into querying against hBase and come up with a few  
> paths to do this, from the hBase shell I can use HQL, from code I  
> am limited to scanners which are roughly analogous to cursors, I  
> 'obtain' a scanner and iterate over a table starting at a row, and  
> once I have a row I can test values in columns.
> Thus for a 'SQL' type join I can fire up 2+ scanners on different  
> tables and iterate over both testing as I go - performance  
> problems?, is there a more efficient way to do this or are scanners  
> innately efficient?
> One other thing I can't see is the return value for a query, do I  
> build my own collection and hand it back to my calling methods - or  
> do we have some helper collection objects ( I noticed 'formatter')  
> to do this.
> Cheers, Alex. Any help much appreciated.

View raw message